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Preface 


Virtual reality (VR) technology has been developed commercially since the early 
1990s [1]. Yet it is only with the growth of the Internet and other high-bandwidth 
links that VR systems have increasingly become networked to allow users to share 
the same virtual environment (VE). Shared VEs raise a number of interesting 
questions: what is the difference between face-to-face interaction and interaction 
between persons inside VEs? How does the appearance of the “avatar” - as the 
graphical representation of the user has become known - change the nature of 
interaction? And what governs the formation of virtual communities? 

This volume brings together contributions from social scientists and computer 
scientists who have conducted research on social interaction in various types of 
VEs. Two previous volumes in this CSCW book series [2, 3] have examined 
related aspects of research on VEs - social navigation and collaboration - although 
they do not always deal with VRA^Es in the sense that it is used here (see the 
definition in Chapter 1). The aim of this volume is to explore how people interact 
with each other in computer-generated virtual worlds. 

From the perspective of social science research, human interaction in computer¬ 
generated environments poses challenging problems: how do we make sense of the 
interaction between people who encounter each other only in the form of graphical 
representations? And what are the research ethics of studying the online behavior 
of subjects who may or may not be aware that they are being studied? There are 
also various phenomena that make for interesting comparisons with social life in 
the real world: what property rights are attached to virtual spaces, objects, and 
buildings? What rule enforcement mechanisms are there in worlds in which avatars 
can reappear in different guises? Perhaps most importantly, there are the questions 
around the relationship between the technology and social interaction in virtual 
environments: how do different VRWE systems affect the behavior between 
avatars? And related to this, how can we improve the technology for effective 
interaction and cooperation among users? 

There are two main reasons for studying social interaction in VEs: one is that it 
may help us to improve the systems and their uses. The second is that social 
interaction in virtual environments will provide insights into different forms of 
computer-mediated interaction: shared VEs can offer instructive points of overlap 
with other related technologies, and they may even, as Chapter 8 shows, provide a 
tool for studying real social interaction. 


Scope and Aims 

There are several ways to give an indication of the scope of this volume. One is by 
the range of methods and disciplinary perspectives involved. Several of the 
contributors (Becker and Mark, Taylor, Jakobsson, and Hudson-Smith) can be 
described as participant observers, making sense of their encounters with other 
“inhabitants” of virtual “communities”. Other researchers adopt an experimental 
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approach, assessing the results of trials in which they are interested in varying the 
technologies and tasks (Slater and Steed, Blascovich, and Salinas). Still others use 
quantitative analysis of logged data (Smith, Famham and Drucker) or combine 
various methods (Axelsson). The researchers also have different disciplinary 
backgrounds, including sociology, psychology, computer science, architecture, and 
communication studies. 

Another indication is the variety of technologies. Head-mounted display systems 
(HMDs), which used to be the epitome of VR, are a small niche within the VR 
landscape nowadays. In this book, two chapters include research with HMDs, one 
includes a haptic system, and the other chapters deal with networked desktop 
systems and immersive projection technology (or CAVE-type) systems. However, 
the technology as such is not the focus of this book; instead, the main focus is on 
how people interact in the VEs created by these systems. 

I should mention that some of the contributions to this volume fall outside the 
definition of VR technology or VEs given in Chapter 1 because the VR systems 
they discuss offer only a second-person view - the users see their avatar on screen 
rather than having a first-person perspective on the VE. Yet, as we shall see, these 
chapters are sure to be relevant for VEs that fall within the definition, and we shall 
also see that the boundaries between different systems are often fluid. 

There is also a wide range of ways in which VEs are used. Several chapters in 
this volume deal with internet-based desktop VEs with large groups of users - in 
the case of online VEs, thousands of users - who use the system mainly for 
socializing. I shall refer to these as “social” VEs (just as certain MUDs, text-only 
Multi-User Dimensions or Dungeons, have become known as “social” MUDs) 
because although they are used for gaming, teaching and other purposes, they are 
generally used for “socializing” (Smith, Famham and Dmcker refer to these 
systems as “graphical chats”). Other chapters deal with small groups in immersive 
systems that are still often used in labs and for demonstrations, and may have a 
variety of applications. 

Nevertheless, the reader will notice that not much attention is devoted to 
applications. Most of the chapters deal with experimental scenarios or with social 
VEs. This reflects the fact that there are not yet many regular uses of shared VEs - 
apart from online social VEs. This should not detract from the importance of this 
field: shared VEs are likely to become widely used in a range of practical settings. 
Projects to develop highly immersive shared workspaces - also known as tele¬ 
immersion - are perhaps the best example [4]. However, it is not yet clear which 
uses of shared VEs are most suitable for applications, and one upshot of this 
volume will be to shed light on this question. 


An Overview of the Contributions 

In my introductory chapter, I provide a survey of the research issues in the field: 
presence, copresence, communication, different types of VEs and social 
configurations, as well as their relation to offline life. I also sketch how these 
issues can be integrated within an overall framework. Research on social 
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interaction in VEs will no doubt continue in many directions, but it is possible to 
see that different studies are beginning to build on and complement one another. 

In Chapter 2, Becker and Mark compare the social conventions - for greetings, 
communication, and interpersonal distance - in three online shared VEs: Active 
Worlds, Onlive Traveler, and LambdaMOO (LambdaMOO falls outside my 
definition of VRWEs since it is text-only). They argue that shared norms in VEs 
are impossible without the context of real world norms. This is an important point, 
but it also invites us to think about the conditions in which people interact in 
shared VEs without meeting face-to-face at all: what kind of shared norms do we 
import into shared VEs? 

In relation to their chapter, I would also like to highlight their findings about 
Online Traveler, which, unlike most online VE systems, also includes audio 
communication. What is interesting in this audio setting is that avatar heads do 
follow the conventions of face-to-face conversations in certain respects - for 
example, keeping a certain amount of distance vis-a-vis each other (compare 
Chapters 8 and 12). On the other hand, other conventions, such as reciprocating 
smiles, are not followed. Here we can clearly see that people make an effort, or feel 
constrained to make an effort, to reproduce their real-life behaviors in the VE, even 
though there is no “objective” need to do so, and that they also allow themselves to 
depart from other real world behaviors. Chapter 2, like several others in this 
volume, makes a useful start in sorting out which conventions are followed and 
which are not. 

How people interact with each other relates to their avatar embodiment, and 
Taylor gives several examples of how people use their bodies in “The 
Dreamscape” in Chapter 3. She shows how participants in this online VE 
customize their avatars and analyzes how avatar bodies can convey group 
membership and other messages. Taylor argues that people's relationships with 
their virtual bodies can take various forms, but that these forms are subject to 
certain constraints and possibilities. Other chapters in the volume also deal with the 
topic of how avatar appearance influences interaction (Chapters 8 and 9), or what 
kind of avatar appearances people prefer (Chapter 6). But avatar appearance will 
influence interaction in all shared VEs, and there is still much research to be done 
on pinning down this influence. 

In Chapter 4, Jakobsson tells the story of an act of destruction in his online world 
in the “Palace” system. He recounts how his attitude to the miscreant changed, and 
how he came to S5mipathize with someone whose identity was apparently very 
much wrapped up in his online life. This is also an illustration of how participants 
misperceive each others’ role and status, which participants nevertheless also 
overcome and adapt to over the course of their online life. 

Chapter 5 is also a chronicle of online life, in this case of 30 days in Active 
Worlds (AW). Hudson-Smith’s experiment allowed anyone to build freely within 
his AW world. The resulting social dynamic, with destruction threatening the 
project and bonds subsequently restored, is a nice illustration of how the process of 
building, and thus shaping the VE, creates a stake in the online community. We 
will need many more such studies before we can build up a systematic picture of 
social life in VEs. 
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Chapters 4 and 5 can be seen as two sides of the issue of deviance and belonging 
in virtual worlds; in Chapter 5, from the point of view of the community, in 
Chapter 4 in relation to the individual. Both point to a widely discussed 
phenomenon, how people misbehave in VEs. But they also point to a less well- 
known phenomenon: that people in VEs develop a strong sense of belonging, or a 
strong commitment to VEs, and that this imposes limits on “bad” behavior. 
Chapters 4 and 5 can thus also be seen as ripostes to the objections that researchers 
in this field often meet; namely, that online life is “only” virtual, or that it is only a 
game. As these chapters show, from the point of view of participants, shared VEs 
are often much more. 

The Social Computing group at Microsoft Research contributes two papers to 
this volume. The first summarizes the “lessons learned” from the experience with 
Microsoft’s V-Chat system and their Virtual Worlds Platform. V-Chat has been a 
mixed success; it has not been as popular as other online VEs like AW or “The 
Dreamscape”. Nevertheless, it has allowed the researchers on this project to 
implement a series of improvements into the VE in an iterative fashion. These 
improvements include supporting the formation of groups and of status, and 
thereby encouraging responsible behavior. They also make recommendations for 
how to provide participants with a persistent identity, and about avatar appearance 
and the appearance of the VE. Many of their suggestions are about how the sense 
of community could be enhanced, and here research on shared VEs adjoins the 
broader area of research on “online communities”, including text-based and other 
internet-based communities (see [4, 5]). Their research is also closely related to 
usability studies, and especially interface usability. Again, for shared VEs, this 
research is still at an early stage, but it is bound to grow as shared VEs proliferate. 

A different kind of usability comes into focus when we consider the long-term 
uses of shared VEs. This issue has raised larger questions and fears, such as 
addiction and health and safety concerns. But, as we point out in Chapter 7, though 
these concerns may be relevant in some cases (especially with HMD systems), it is 
also useful to study this phenomenon empirically - since almost no such studies 
exist about shared VEs. Our very exploratory study shows that issues that we did 
not expect at the outset - such as how well people get used to each other’s 
presence, and how important it is to have a good sense of the other’s intentions - 
proved much more important than many of the more far-flung issues that have 
been envisioned for long-term uses. Moreover, the problems around how to make 
social interaction more effective between participants are only partly issues that 
can be solved by technical improvements. Instead, they will, to a large extent, be 
solved by user awareness of the possibilities and constraints of small group 
meetings in shared VEs - or through users adapting their behavior to these 
environments. 

Slater and Steed (Chapter 9) review a series of experimental studies that they 
have undertaken. Their experiments were concerned with leadership and social 
discomfort in small groups, gaze direction, training actors in a VE, and public 
speaking in front of a simulated audience. Their studies provide lots of detailed 
insights into how people interact in VEs. However, apart from their findings in 
individual studies, the main point to highlight here may be that shared VEs provide 
such a strong sense of being there with other people, and thus allow strong 
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interpersonal feelings like embarrassment or subtle acting cues to be conveyed - 
against what we might expect from what are often regarded as “cold” or 
impersonal environments in which many interpersonal cues are missing. 

Blascovich and his colleagues have begun to use shared immersive VEs as a 
laboratory for studying social interaction. In chapter 8, he describes several 
experiments to measure social influence, and develops a model for identifying the 
factors which influence “social presence”. As Blascovich points out, VEs offer a 
number of possibilities that real world research does not, which include being able 
to vary factors in the interaction and in the environment in a precise and replicable 
way, as well as being able to record the whole of the interaction. 

Using VEs for this type of research, as Chapter 9 also shows, has enormous 
potential for social and psychological research. Moreover, this is an area where 
experimental research and the study of how people behave in the naturalistic 
setting of an ongoing VE can inform each other. To mention only one example: 
Chapters 8 and 12 both analyze proxemics, or how people move in relation to each 
other - the former in an experimental setting and the latter based on data of V-Chat 
users. In relation to a number of issues - how closely people stand to each other, 
how they face each other, and how they form groups - these two types of studies 
will yield complementary insights and provoke new research questions. 

Apart from reporting on two sets of experiments in shared VEs, Salinas, in 
Chapter 10), also provides a good overview of research on copresence or social 
presence, and how this relates to studies of different modalities of communication 
and collaboration in computer-mediated communication. Then she gives the results 
of two studies: the first compared voice, text and video communication in a 
decision-making task in Active Worlds, and the second a task moving objects with 
and without haptic force feedback (the sense of touch). Apart from contributing to 
a growing literature on shared VEs that examines the interplay between technology 
and collaboration, Salinas also provides insights that will be relevant to many 
practical uses of shared VEs. 

This is also good place to point out that one argument that is often made in this 
context - that it all depends on the task, on the technology, on the participants, etc. 
- is obviously unsatisfactory, since we will need to have valid generalizations 
about the capabilities of different media. In the case of shared VEs (the majority of 
VRWE systems are visual), a growing body of research shows that for highly 
spatial tasks, and tasks with a narrow focus, shared VEs are highly effective [7]. 
Experimental work of the type represented here by Blascovich (Chapter 8), Slater 
and Steed (Chapter 9), and Salinas (Chapter 10) will allow us to build up 
knowledge that can be applied across different shared VE settings and tasks. 

In Chapter 11, Axelsson offers a subtle analysis of status differences or 
stratification in VEs. Her discussion covers status and stratification imported from 
offline behavior, technologically created stratification, and the stratification created 
within the VEs themselves. Her examples relate to language barriers, to different 
roles and privileges, and to how different technological systems can affect the 
capabilities of participants. It may, of course, often be hard to distinguish the 
different causes of stratification in shared VEs, but they need to be disentangled if 
we want to be able to understand them correctly and overcome them. 
Recommendations for how it is possible to create more equal shared access - if this 



xiv The Social Life of Avatars 

is a desirable goal - will provide a useful starting point for researchers and 
developers who are trying to improve shared VEs. 

Chapter 12, the second chapter by the Social Computing Group at Microsoft 
Research, is a detailed quantitative investigation of Microsoft’s V-Chat. This 
analysis was made possible by logging data about avatar behavior. The results 
provide insights into proxemics, how people participate, and much else. This is a 
good example of what a quantitative study and logging of a VE can achieve, and 
although there are still very few studies of this type, this is bound to become a 
growing and rewarding area of research. It will also become important to combine 
the insights from such quantitative studies with the results from experimental 
studies and from qualitative or participant observation research. 

Finally, I would like to acknowledge that the research at Chalmers University 
represented in this volume has been partly supported by the Swedish Transport and 
Communications Research Board (KFB, now VINNOVA). I would also like to 
thank all the contributors for their sterling efforts, and Rosie Kemp, Melanie 
Jackson, Karen Borthwick and Joanne Cooling at Springer for their superb help. 

Ralph Schroeder 
Gothenburg, September 2001 
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Chapter 1 

Social Interaction in Virtual 
Environments; Key Issues, Common 
Themes, and a Framework for 
Research 

Ralph Schroeder 


In this chapter, I will give an overview of some central issues in research on shared 
virtual environments (VEs) - including “presence”, “copresence”, communication, 
and small and large group d)aiamics - across a range of virtual reality (VR) 
technologies and different conditions under which they are used. I will discuss 
different studies of the interplay between technological systems and their social 
implications, and how sociological insights about interaction in the real world can 
be brought to bear on interaction in VEs. Finally, I will argue that making links 
between different areas of research can lead to a better understanding of social 
interaction in VEs. 


1.1 Background 


In the early 1990s, the dominant image of VR, and what most laboratories and 
developers focused on, was of single-user head-mounted display (HMD) systems 
[1,2]. Nowadays, there is a range of technologies, from expensive and immersive 
projection technology (IPT) or CAVE-type [3] room-size VR systems in which the 
environment is projected onto several walls, via HMDs, to free VR software that 
runs on desktop personal computers (PCs). 

Only since the mid-1990s, with the popularity of the Internet, has it become 
feasible to link many users simultaneously in shared or multi-user VEs. Today 
there are dozens of internet-based VEs that can be run on PCs, and in which 
hundred of thousands of participants have created virtual social institutions such as 
shopping malls, churches, museums, classrooms, and the like. There are also 
dozens of trial systems being developed in computer science laboratories around 
the world that aim to develop shared VEs for a variety of purposes; among others. 
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virtual business meetings, scientific co-visualization, virtual therapy, and 
entertainment. These experimental systems often make use of more complex 
technological systems such as high-end computer graphics workstations, HMDs, 
and a host of other display systems, input/output devices, and computer graphics 
software. 

Despite this proliferation of technologies, and although the word “virtual” has 
come to be used in lots of different ways, there is nevertheless a core area of 
research on multi-user VRA^Es that most researchers would recognize. I have 
previously defined virtual reality technology as “a computer-generated display that 
allows or compels the user (or users) to have a feeling of being present in an 
environment other than the one they are actually in and to interact with that 
environment” [1]. This definition is close to that of most researchers in the field, 
and it is also grounded in a particular understanding of the social implications of 
new technologies. Shared VR technology, or shared VEs, can therefore be defined 
as VR sytems in which users can also experience other participants as being 
present in the environment and interacting with them. 


1.2 Methods and Approaches 

There are two main methods for studying social interaction in VEs. Experimental 
methods typically make use of “purpose-built” environments to study a controlled 
set of conditions, whereas qualitative methods, such as participant observation, are 
often used to study “naturalistic” settings. This need not be so: Chapter 12 in this 
volume examines a “naturalistic” VE setting in a quantitative way. Nevertheless, 
most studies are either based on short and controlled trials or on longer-term 
observations of what people do in ongoing VE settings. 

Some research areas lend themselves more to one or other method. It is difficult 
to envisage how it would be feasible to study how people build complex virtual 
settlements by means of experimental methods, or under controlled conditions - 
though chapter 5, 30 Days in Active Worlds, comes close to being an experiment of 
how people build in VEs. Or again, experimental results about how people interact 
in short collaborative tasks (see, for example. Chapters 9 and 10) - relating to how 
people work together, for example - might not apply if the setting was a more 
“naturalistic” one, or one where the subjects were not under the experimenter’s 
gaze or influenced by their instructions. It is difficult, however, to make trials 
natural, or to carry them out over longer periods. 

In our studies, and in the chapters in this volume, a variety of methods have been 
employed, including various kinds of experimental studies and forms of participant 
observation. The latter has involved spending a long time, especially in Active 
Worlds (AW) - one of the most interesting online VEs which is discussed in 
several chapters - taking detailed notes on particular phenomena that are of 
interest, or conducting semi-structured online interviews with users. Experimental 
studies often vary the conditions - say, with different VR systems - and compare 
the results. These studies also often make use of questionnaires to get the responses 
of the “subjects” and sometimes use audio or video recordings. 
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These different methods point to the variety of approaches in the study of shared 
VEs: on the one side, there is the more technically oriented literature which often 
comes from researchers in computer science departments and covers VR 
technology, collaborative virtual environments, human factors, computer- 
supported cooperative work, and the like. On the other side, there is the social 
science literature around MUDs (Multi-user Dungeons or Dimensions), identity on 
the Internet, or new media and society [4, 5, 6]. In this case, researchers often come 
from sociology or media/communications studies departments. 

Clearly, it will be useful to continue with a variety of methods, but it is worth 
making some brief comments. One is that the questions in studying social life in 
virtual environments are still emerging. As we shall see, the findings of studies are 
often from initial trials or from early uses of systems. Another is that a number of 
questions about research ethics and methods remain, as we found, for example, in 
our study of a church service in an online shared VE: should the settings and 
informants be treated in the same way as in the real world? Are informants who are 
only encountered virtually reliable? [7]. Finally, it can be hoped that findings using 
different methods will have the positive result that they complement each other, 
though it remains to be seen how well-integrated research on social interaction in 
VEs will become. With this, we can turn to the key substantive issues in shared 
VEs. 


1.3 Presence and Copresence 


“Presence” is a term that will be familiar to VRWE researchers, but it will not be 
familiar to those outside of this research community [8]. VR technology, as 
indicated by the definition given earlier, is about “being there”: presence is 
therefore partly to do with the technology, and partly to do with the participants’ 
state of mind. A recent overview of research [9] discusses several concepts and 
ways of measuring “presence”. This overview also covers some commonly used 
indicators of presence, such as “immersion” and “involvement” in the 
environment. A further debate that they review is between “subjective” measures 
of presence, which are often obtained by means of questionnaires, as against 
“objective” measures, which entail, for example, the timing of task performance or 
heart-rate measurements. 

Much of the experimental research on presence to date has been on immersive 
systems (IPT systems, HMDs), or comparisons with desktop systems. This 
research includes several studies where participants carry out a task first in one 
system and then in the other, or where participants using one type of system 
collaborate in the VE with participants using a different type of system. These 
studies often show that participants experience a greater sense of presence in more 
immersive systems than in less immersive ones. 

However, it is important to broaden this discussion. Reseach with immersive 
systems typically involves short and controlled trials and a particular task. Yet 
users of AW, who typically spend a long time in this online shared VE, often 
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without a particular task except socializing, surely also have a sense of “presence”. 
They may be less focused on their activity, and have a system with poorer 3D 
graphics - and since they may also be using the AW VE in a fairly routine, on 
again/off again manner - they may not think of themselves as being “present” in 
the same way as users of immersive systems. Nevertheless, they clearly have a 
sense of “being there”, even if “presence” is not an issue for them. Still, 
“presence” is typically discussed in the context of immersive rather than desktop 
systems [9]. 

The same point can be made differently: in presence research that has involved 
short experimental trials, users will typically answer questions about presence on 
Likert-type scales. However, if, for example, users regularly spent time in a highly 
immersive VE such as an IPT system, would they respond to such questions about 
presence differently? We can see then that the immersiveness of the technology per 
se may only be one dimension of presence, and the “mundaneity” of use may be 
another. 

Presence thus depends on a variety of factors, including the task, the VE, and in 
shared VEs, on copresence (as we shall see in a moment), and these factors will 
often outweigh the technology in affecting presence. Ultimately, if we want to 
measure different degrees of presence in VEs objectively, we may only be able to 
do so, as Ellis has argued [10], by measuring different variables and against each 
other (Ellis uses the notion of “equivalence classes”). In other words, it may be 
necessary to create different VE scenarios that are comparable, and systematically 
measure presence in one against the other, not only in terms of performance as 
Ellis suggests, but also in the light of other variables. Another way to study 
presence will be to compare presence and interaction in different types of VEs with 
equivalent real scenarios [11, 12], as well as with other mediated environments. A 
combination of these methods will ultimately lead to a more comprehensive and 
thorough understanding of presence. 

Note that presence does not depend on the fidelity or “realism” of the VE: a 
“fantastical” or “abstracf ’ VE can also provide a sense of “being there” for the 
user. Moreover, a number of studies have shown that presence does not necessarily 
increase task performance [9]. The reason for this may be that users need to divide 
their attention between the environment and the task in a situation where both are 
highly engaging. 

This brings us to “copresence”: presence, or “being there”, and copresence, a 
sense of “being there together”, are bound to be closely related. Again, we can 
initially widen the discussion instead of focusing exclusively on the results of 
experimental studies. For other media, issues similar to “copresence” are often 
discussed under the rubrics of “social presence” or “media richness” (see Chapter 
10 in this volume and [6] for reviews). Shared VEs are a rich medium in the sense 
that they allow people to interact via several senses. In the case of most of the VEs 
treated in this volume (with the exception of haptic VEs, see Chapter 10), they 
allow people to interact via audio/text and via a 3D visual environment. This sets 
shared VEs apart from telephony, video conferencing, and other media of 
communication - though whether they are similarly useful or enjoyable media 
remains to be seen. Nevertheless, it is a popular reaction to shared VEs, especially 
among novice users, to comment on how lifelike they are. 
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The richness of this medium has been demonstrated in various ways. Slater and 
Steed (Chapter 9), for example, have shown that people have a very strong reaction 
to others, even if they are merely computer-generated agents. And as their acting 
trials show, forms of interaction which require sophisticated social cues - such as 
acting together - are feasible in shared VEs. Another example is our trial, carried 
out jointly with Steed and Slater, with networked IPX systems, in which we 
demonstrated that certain tasks can be carried out remotely as if “being there 
together”[13]. In this trial, we linked two IPT systems to allow two people - one in 
London, one in Gothenburg - to collaborate on solving a Rubik’s cube-type puzzle 
(see Figure 1 in Chapter 11). We demonstrated, by comparing this with the 
equivalent task carried out face-to-face with cardboard boxes, that such a highly 
spatial and collaborative task can be done just as effectively in networked VEs as 
in a real face-to-face setting. 

Some elements of shared VEs, on the other hand, detract from the richness of the 
medium: one is that many social cues are missing. For example, communication in 
shared VEs is often via voice, but many bodily cues are missing. And although 
non-verbal communication is sometimes used, some studies suggest that it is not 
used as much as in face-to-face interaction - even if many of the rules of non¬ 
verbal communication, such as turning your gaze in the direction of your intended 
audience, are adhered to (see Chapters 2 and 12). 

Many rules that govern copresence will be affected by technology, and here 
systematic comparison between different technologies will be useful. This is an 
obvious point, but it deserves restating here in order to stress that this is not just a 
question of “high-tech” versus “low-tech”, or highly immersive systems versus 
desktop systems. Rather, as Chapters 5 and 11 show, there are many features of 
“low-tech” desktop systems, such as access privileges or technology for the 
appearance of the environment, that have consequences for “copresence”. 

Nonetheless, shared VEs often combine a high degree of presence with a high 
degree of copresence because the sense of being in another place and of being there 
with another person reinforce each other. It may seem self-evident that presence 
and copresence should go together, but although we have some studies that point in 
this direction (Chapter 9, [13, 14]), we lack research, again, of the type suggested 
by Ellis, whereby a number of comparable settings are studied against each other. 

It also seems possible that the effect of copresence may “wear off’ as the 
novelty of the medium wears off At the same time, it seems likely that copresence 
will increase with the degree to which copresent users establish strong 
relationships in VEs. The first is something that shared VE researchers will know 
from their own experience, the second is clear from interviews with long-term 
users [15]. 

As in the case of presence then, it appears that users are able to cope well with 
the absence of certain features of the real world or of face-to-face copresence - 
while they also need other features. As Buescher et al. [16] have argued, 
participants in shared VEs need at the very least a reciprocity of perspectives to 
make sense of each other’s actions. Reciprocity is thus one of the most elementary 
building blocks of social interaction, and from an analysis of this reciprocity, and 
how it influences how copresent participants focus on or turn their attention away 
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from others, it will be possible to build up a picture of more complex forms of 
interaction. 


1.4 Communication 

To study communication in shared VEs will require a combination of perspectives, 
including social psychology, sociological analysis of interaction, and 
communications studies approaches to different media. Shared VEs typically make 
use of either audio or text-communication plus non-verbal communication. 

First, I should explain briefly why text-based communication is included in a 
discussion of shared VEs. In a sense, text-based communication (as, for example, 
in AW) is not VR because it does not enhance - but rather detracts from - the 
sense of presence and copresence. Nevertheless, the reasons for including text- 
based communication here are: first, that it is widely used in large-scale internet- 
based VEs; second, there is an extensive research literature on text-based 
computer-mediated-communication (CMC) which, as several chapters demonstrate 
(especially Chapter 2 and 11), can be usefully brought to bear on social interaction 
in VEs; and, finally, the study of VEs not only benefits from comparisons with 
other forms of CMC, but also from comparing different modalities inside VEs (see, 
for example. Chapter 10). Arguably, VEs will never provide completely “realistic” 
ways of interacting or communicating with others because a number of features of 
face-to-face interaction will always be lacking. It is therefore instructive to 
compare different modes of communication in VEs, for example text with voice 
(Chapters 2 and 10) or with face-to-face communication [17]. 

In relation to shared VEs that support audio communication, one finding that has 
emerged again and again (see the results of the COVEN trial [18], and Chapter 7 in 
this volume) is that the quality of the audio communication can be a major obstacle 
to collaboration and fluid interaction. It can be anticipated that this technical 
problem will be overcome, but there is also an implication for the design of shared 
VEs here: there is little point in developing a technologically sophisticated or 
visually complex shared VE unless the audio communication works well, since this 
is critical for effective or enjoyable interaction. 

Some evidence for low media richness or low social presence in VEs is that 
people do not use non-verbal communication as much as in face-to-face interaction 
- as mentioned earlier. This does not mean that they do not use their bodies to 
communicate (see Chapter 3 for examples). Nevertheless, the dearth of non-verbal 
communication needs to be set against the observation that users seem to be able to 
adjust easily to communication in shared VEs. 

Analyzing communication in VEs can also take place on different levels: for 
large groups like a population in AW, we can examine the use of language by 
looking at the encounters between different national languages, for example, or at 
greetings, or at the number of words per contribution, and how this might differ 
from “real world” conversation structure (Chapter 2, [17, 19]). In small groups, we 
can analyze whether the VR system used makes a difference to who dominates 
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verbally in a task scenario - for example, in carrying out a spatial task which 
requires a lot of communication and where participants are using different systems 
[13, 14], one of which is better to suited to the spatial task. Again, both types of 
study will be useful since effects such as increased or diminished equality due to 
technology will operate across both large and small groups (see Chapter 11). 


1.5 Avatar Appearance and the Appearance of the 
Environment 

One interesting “lesson learned” by the developers of Microsoft’s V-Chat system 
(see Chapter 6) is that people would like to be able to have greater control over 
their own avatar representation or have input into its design. This demand from the 
users is also supported in Chapter 3, and - as I found out in an interview with the 
developers of the AW system (see [20]) - it is also the most common request that 
AW users have. 

It is interesting to note that meeting this demand currently presents several 
technological challenges: one is to provide the user with the tools to create their 
own custom avatar. Another is to do with network capabilities: in shared VEs, 
should each avatar representation permanently reside on each of the other users’ 
computers? This would create memory problems if there were hundreds or 
thousands of unique users. Or, should each new avatar only be downloaded when it 
is used? This would avoid the storage problem on each computer, but it would 
require lots of bandwidth. A related issue here is the complexity of, or the amount 
of data required by, each avatar. Finally, a few customized avatars may not be a 
problem, as in the case of small groups, but in larger populations there will 
continue to be a trade-off between unique and complex avatars and technological 
capabilities. As it stands, therefore, it is only possible to have a small number of 
custom avatars. 

No doubt technical solutions will make progress here. It is also possible to 
anticipate that avatars will feature a mixture of computer-generated representations 
and real-time video images of users [21] - so that avatars will range from cartoon¬ 
like, as they are now, to very realistic. Chapter 9 provides some images of avatars 
that are quite realistic. But as Cheng, Famham and Stone have found (Chapter 6), 
users may want avatars that are neither too abstract nor too realistic. It is therefore 
too early to say how much avatar customization will in fact be demanded by users 
in systems where they are given a choice. 

Perhaps a mixture of “off-the-shelf’ or ready-made avatars and customized 
solutions will emerge in VEs. In the meantime, several chapters in this volume 
(Chapters 3, 8, 9) provide some indications of the effects of avatar appearance on 
social interaction. My point, again, is that what needs to be considered here are not 
just the effects of avatar interaction on individual encounters, but also issues such 
as the influence of the persistence of avatar appearance in different conditions: for 
example, what kind of persistence do users need in order to recognize each other 
over repeated encounters? And what kind of diversity of avatar appearances is 
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needed within both small and large groups for participants to be able to distinguish 
one another - and what diversity can they cope with? 

For the features of the landscape and the built environment in shared VEs too, 
there are points of overlap between online shared VEs with many users and 
immersive VEs for small groups. Chapter 5, for example, points out that there is a 
build-and-abandon attitude in AW. This applies to immersive VEs too, since many 
VEs are developed and built, often at great cost in terms of labor, but are only used 
for a few demonstrations - after which they are abandoned and never used again. 
Again, this observation has implications for the design of VEs. 

Environments with large populations often have extensive and varied landscapes, 
such as the hundreds of worlds in AW. Perhaps the easiest way to make this point 
for those who are not familiar with AW is to say that it would take many days to 
see the various sights and to become familiar with the social milieus that can be 
found in the many worlds that have been built in AW. And again, AW is only one 
- though perhaps the most interesting because it has largely been created by users - 
among several internet-based social VEs. 

Another feature that should be mentioned is the mixture in the environments of 
elements that imitate the real world as against those that depart from it; or, real 
versus imaginary VEs. Examples of “realism” include the way that the layout of 
densely populated areas imitates real world cities, the resemblance of many 
buildings in AW to real world buildings, and the furnishing of many houses with 
chairs and tables (which serve no function apart from decoration or orientation); 
examples of the “imaginariness” in AW are the frequent use of all-glass transparent 
buildings, buildings which imitate science fiction or which are built in the sky, and 
objects like waterfalls or flames in unlikely places. 

In experimental studies, the appearance of the VE is typically related to the topic 
under investigation: visualization, collaboration, acting rehearsals, etc. There are 
also some highly realistic environments for training and games; military 
simulations and internet-based games like Quake and Doom are good examples of 
the latter (though they fall outside the definition of VR given earlier). These may 
have a higher degree of “realism”, but they are often restricted to a particular 
functionality: the user must follow certain rules (in a game) or manipulate the 
environment by means of certain tools or weapons. 

Shared VEs that are used in experimental trials will often be more restricted in 
scope and more abstract. Online VEs, as mentioned, are very extensive and mix 
fantasy and realism. And although there are studies [22, 23] of the geography of 
(mainly online) shared VEs, we need a more comprehensive classification of the 
appearance of VEs. Such a classification would be useful because it would allow 
us to relate their appearance to how users interact with them: what features must 
the environment have in order to enable particular types of social interaction? This 
is an issue which goes beyond joint navigation or wayfmding [24], and there is 
often a mismatch here, especially in that environments are often too complex for 
users’ needs. 

It may seem obvious, again, in taking a broader view, that the appearance of the 
environment will affect not only navigability (the issue that has been most studied 
so far), but also how users interact with each other. As we shall see below, this is 
an issue that can be framed in terms of how much the appearance of the 
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environment detracts from - or allows the user to focus on - interaction with other 
users. 


1.6 The Dynamic of Small and Large Groups 


Apart from copresence and communication, we also need to analyze social 
interaction per se. Several studies in this volume show that - in small groups and in 
large ones - users, to a large extent, follow the conventions of the real world. 
These conventions include keeping their distance from each other, turning to face 
their conversation partners, and so on (Chapters 2, 8, 9, 12). It is equally clear from 
a number of studies, however, that users do not follow these conventions in other 
respects: for example, they do not use gestures very much, they more readily 
abandon and destroy buildings (Chapters 4, 5), or they treat a person with more 
powerful VR technology as the leader, even though they don’t do this in the 
equivalent real situation [11]. 

This is an area where so far few links have been made between the rules 
governing small and large groups (but see [25]). Studies of shared VEs, and of 
CMC generally, tend to focus either on small groups of up to three or four, or they 
study large groups or populations in shared VEs or CMC. Some of the intermediate 
levels have been analyzed for other forms of CMC - such as the use of email tools 
in organizations - but this has not been done for VEs. In shared VEs, perhaps the 
closest we can come to this intermediate level are the inhabited TV trials [26]. It is 
interesting to note that in these settings, where there were dozens of participants, 
one limitation that became apparent was that not many could actively participate, 
and a divide emerged between active participants and onlookers. In other words, in 
shared VEs, as in the real world, the focus of attention needs to be concentrated on 
a few members, and it is difficult for people to participate actively in a large group 
event. 

It can be noted in passing that this also points to a limitation on the notion of 
“interactivity”, which is often used in discussions of VEs and other electronic 
media. As we can see, in shared VEs, even if the user possibly experiences more 
interactivity than in most other new media technologies, nevertheless, the 
interactivity with others inside the VEs is subject to similar constraints as in real 
life - perhaps even greater ones. 

The gap between the study of small groups and larger populations is also 
characteristic of social science as a whole, where micro- and macro- are not well 
integrated (though see [27] and [28] for attempts in this direction). Nevertheless, 
there are crucial links here; for example, whether one’s role in a small group is 
recognized as much in a VE as in real settings (as some studies have analyzed) will 
also be significant for larger groups or populations. And vice versa - whether 
leadership roles or hierarchical roles are generally acknowledged in large groups in 
shared VEs as in the real world, or if there are status equalization effects (see 
Chapter 11) will also translate into small group behavior. These links between 
small and large groups apply to a range of issues, and eventually the study of 
shared VEs will have to bring them into a comprehensive framework of analysis. 
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1.7 Relation to Offline Behavior and to Other 
Media 

One current limitation of the study of shared VEs is that we know little about how 
online behaviors affect users’ behavior offline. This is partly because shared VEs 
are often studied in the context of short task-related trials. Even where there is 
research on longer-term shared VE settings, such as online ones, there are no 
studies yet which relate this to the real-life context of users: in the case of “social” 
VEs, which are the most widely used, there are no studies which have related 
“online” and “offline” life, though there are some studies which have related the 
two for text-based social MUDs [29, 30]. And although there has been an overview 
of after-effects research, mainly for immersive systems [31], this has mostly been 
concerned with problems such as short-term disorientation or nausea, rather than 
with the effects of interacting with others in the VE or with the real-life contexts of 
users. 

I have argued elsewhere [25] that, in the end, the study of social interaction in 
VEs needs to be integrated with the study of the uses of other communications 
media and how these media, including VEs, fit into our everyday lives. By 
comparing shared VEs with other forms of CMC and other media, and combining 
this with studies which compare virtual versus real interaction during short trials 
and how this affects social interaction (such as leadership and embarrassment, see 
Chapter 9), we may eventually be able to relate virtual and real interaction more 
systematically. 


1.8 A Framework for Research: Frames, Focus, 
Roles, and Networks in VEs 

Identifying key issues and common themes provides the backdrop for sketching a 
framework that brings together the various facets of interaction in shared VEs. One 
way to bring insights from the social sciences to bear on shared VEs is to start with 
Goffinan’s ideas about the “frames” of social interaction [32, 33]. For Goffman, 
frames are the stages on which we play out our social roles. However, in shared 
VEs, the way we act and interact with others is technologically mediated. Thus 
VEs have a different kind of “bandwidth” from real world frames for presenting 
the self to the other. When we enter a VE, a shift in the “frame” takes place, and 
the bandwidths (in the non-technical sense) of different types of VE vary a great 
deal between, say, highly immersive and non-immersive VEs (see Figure 1.1). 
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Figure 1.1 Frames, focus, roles and networks. 


“Bandwidth” in this case is not the same as “media richness”, “communication 
modality”, or “social presence”. Instead, using the notion of frames with different 
bandwidths in a VE allows us to apply the rules of face-to-face interaction to 
interaction in different shared VE settings: that is, different VR systems or types of 
VEs provide different frames for our encounters. And, to anticipate, if this applies 
to individual encounters and how we present ourselves to each other, it will also 
apply to larger groups. 
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Before we proceed further with this framework, two asides are necessary: the 
first is that although Goffman is often interpreted as arguing that “it all depends on 
context”, or that social interaction is “relative” to the particular “frame” in which it 
takes place, he can also more accurately be regarded as advocating the systematic 
study of frames whereby generalizations can be made across different frames or 
contexts [28]. In other words, Goffman’s ideas can be seen as part of an objective 
social science which applies not just to individual frames of interaction, and which 
can also be incorporated into a more macro-analysis of social structures. Second, 
while Goffman did not apply his ideas to communications media, they can 
nevertheless be extended to provide powerful insights into the social implications 
of electronic media, although this is beyond the scope of this chapter (but see [34]). 

For Goffman, the next step in the analysis is to look for the focus of attention in 
the social interaction or in the encounter between people. In shared VEs, as we saw 
earlier, the degree of focus (and distraction) that is possible relates to presence and 
copresence - or here, to the bandwidth of the frame. So, for example, some VEs 
are (initially) so visually rich as to overwhelm the user. In other VEs, the 
environment may be abstract or information “poor”, leaving the user to focus on 
the task or on the interaction with the other participant(s). In other words, here the 
focus is on what people do together. In this way, the question becomes not so much 
“how present do users feel”, but where is their attention focused in the VE? 
(Benford and colleagues have also used the notion of “focus” in VEs [35], but they 
use it to analyze spatial orientation, not social interaction). 

In certain small group studies, such as our Rubik’s cube trial (see Figure 1 in 
Chapter 11 and [12]), there is a high task focus and the focus on interpersonal 
relations is secondary. In other trials, such as the acting trial reported by Slater and 
Steed in Chapter 9 or in Blascovich’s chapter (Chapter 8), there is a high 
interpersonal focus because the interaction revolves around interaction with others 
rather than a shared practical task. We can also see a high focus on interpersonal 
relations in the close-knit groups that are described in the chapters about online 
VEs (see Chapters 4 and 5, [15]) where a lot of attention is paid to the way 
participants present themselves to others and to how the rules of relationships are 
followed. 

“Focus of attention” applies not only to how the user perceives or engages with 
the environment and other users, but also outside it - how much distraction there is 
from the frame of the VE. An HMD system, for example, will almost completely 
“shut out” the world around the user, though it has often been noted that wires and 
other obstacles may distract the user. Or again, an IPT-type system, which may 
provide a greater sense of presence in terms of “place” than an HMD system, may 
leave the user with a peripheral sense of others being there in the real world 
(outside the walls of the IPT-type system), and therefore diminish the copresence 
with those inside the VE. Similarly with desktop systems: users may have a high 
degree of copresence if, for example, they are engaged in a highly engaging spatial 
task, or participating in an online religious service - both of which involve a 
common focus of attention. However, this sense of co-presence can be weakened if 
they split their attention between others copresent in the VE and another person 
sitting beside them in the real world. 
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The difference between this and other frames, which makes the very definition of 
VRA^E in terms of “being there” so important, is that VE frames are entirely 
technologically mediated. However, while frames should be analyzed in terms of 
technological mediation, the analysis of interaction within frames proposed here is 
much like the analysis of real world interaction, unlike other theories of new media 
which focus on “interactivity” or “media effects” and the like. In other words, this 
framework does not treat shared VEs differently from real world interaction, 
except in aiming to compare and contrast CMC with face-to-face interaction, and 
putting the use of shared VEs into the larger context of our uses of CMC and other 
new media in society. 

Another key aspect of applying frames to shared VEs is how much people have 
become used to VEs. As Chapter 6 shows, regular users navigate less. Perhaps this 
indicates that they have become more focused on interpersonal relations rather than 
on moving around in the environment. Axelsson and Schroeder similarly found this 
when interviewing regular users of AW [15]; their involvement in the environment 
depends on how much they have built and routinely interacted with others. This 
also became clear over the course of hourly sessions in AW (see Chapter 7); during 
these sessions with their variety of activities - building together, exploring, making 
presentations, planning, and the like - attention was unevenly divided between the 
environment, the others, the task, and, peripherally, the real world. 

The key variable within the frame is therefore the focus of attention - on the co¬ 
present others, on the task or interaction, and on the environment. The frame, its 
bandwidth, and our focus in it - what I discussed earlier under the headings of 
presence, copresence and modality of commxmication - thus shape how we 
experience the VE as a place and how we engage with others. There will continue 
to be various approaches to studying presence and its facets - immersion, “being 
there”, etc. What I am suggesting is that a person’s presence in shared VEs can be 
seen as part of their interaction with others, which includes how we present 
ourselves to others and encounter them in small groups. 

Communication, and especially the modality of communication, can thus also be 
incorporated within the framework of frames and bandwidth: different types of 
shared VEs will provide different opportunities and constraints for presenting 
ourselves to, or communicating with, others. Some examples are described in 
Chapter 2. However, we can also examine language or communication in relation 
to the focus of attention in small groups: who takes a dominant role in 
communication in relation to a certain task [11, 13], or how task related or non-task 
related (socializing) the conversation is, and so on. And we can compare different 
modalities of communication, as Salinas does (Chapter 10). In relation to text- 
based communication in shared VEs, it is clear that there are major differences in 
form and content from VEs with audio or from face-to-face commxmication: in 
text-based VEs, there is much more focus on addressing each other and gathering 
contextual information, shorter exchanges, etc. (Chapter 12, [17]). 

From encoimters between individuals, where the self is presented to the other, we 
develop different roles vis-a-vis others in different circumstances, and thus also 
take on different roles in different networks of relationships. Before moving on to 
the discussion of roles, however, it is important to note that our roles in shared VEs 
relate to focus: focusing our attention on several people in a VE can be 
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burdensome, and it seems that this is often an obstacle in small groups in shared 
VEs, especially in comparison with real settings. This is partly to do with the 
restricted field of view, and partly with the absence of social cues in VEs. Put 
differently, it is difficult to experience the “cocktail party” effect in shared VEs 
whereby, in the real world, we can follow a conversation across the room. Yet not 
being able to cope with many simultaneous complex impressions will also leave 
room to focus on other aspects of our interaction with others, as we have already 
seen in a number of examples. 

Roles also depend on the setting. In the “social” VEs that are described in several 
chapters, there is a certain “frisson” for novices who encounter other people in the 
form of avatars for the first time, and they may experiment with how they present 
themselves and which rules they break or follow [30]. Nevertheless, participants 
will also compensate for the absence of social cues and for well-defined roles by 
presenting lots of information about themselves (name, age, sex, location) and 
gathering as much information as they can about others. This is a form of 
interaction to which participants quickly adapt. And over longer periods, we have 
found [15], as has Schiano [36], that participants generally maintain stable roles (or 
“identities”) and increasingly adhere to the norms that they have come to share 
with others. (It can be mentioned in passing here that the novelty or familiarity of 
VEs also ties the frame of the VE to its offline context: online VEs, where 
participants invest more or less in their role or their “online persona”, are an 
obvious example.) 

Interestingly, a number of studies [13, 14] show that role differentiation or a 
“division of labor” can take place “automatically” in shared VEs because of the 
different technological capabilities of the participants - even when they are not 
aware of the difference between the systems they are using. So, for example, in our 
Rubik’s cube trial (see Figure 11.1 in Chapter 11), the person in the immersive VE 
concentrated on the spatial task while the person on the desktop system stood back 
and verbally supervised. 

However, the strength or weakness of the role - for example, leadership in a 
group [11] - is not just a product of the particular encounter or situation, but also 
depends on how strongly roles are shaped or defined in the shared VE as a whole; 
in other words, how pronounced the system of roles or of stratification is (Chapter 
11, [20]), and this will carry over from small groups into larger ones, and vice 
versa. 

Frames apply to individual and small-scale encounters, but they also apply to 
groups, with virtual meeting places as the stages for larger gatherings. From 
encounters, where p^icipants develop roles vis-a-vis each other, share 
perspectives, and engage in relationships of reciprocity, we can thus move to 
analyzing social networks. Yet even here, for shared VEs, it is clear that the frame 
and bandwidth will dictate the density (or lack of density) in larger online 
networks. 

The notion of social networks is particularly useful in this case since all shared 
VEs involve technological networks that create new relationships between people. 
From roles, which are suitable for the study of encounters in small groups, we can 
thus move to larger groups and begin to examine phenomena such as the 
differentiation of roles within groups, the division of tasks (or of labor) (see above) 
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and the exchange of resources (see Chapters 3 and 4). A number of larger social 
phenomena described in these chapters and elsewhere - building, social 
conventions, etc. (Chapter 2, [7, 15, 21]) - apply both to small and large groups or 
networks. 

Much here depends, as these studies also show, on how much users have come to 
know each other and how much they have become involved in the VE, including 
shaping and adhering to its norms and helping to build the VE [22]. And again, 
networks tie online to offline behavior, as when we can map the online relations 
onto real world relationships, say, in collaborative groups which meet both on- and 
offline, or in the offline conventions that AW users have in addition to their online 
meetings. 

Apart from the links in larger groups, the study of social interaction will want to 
examine online social structures in various ways - as in the social world at large. 
However, for shared VEs, it is likely that networks will always play a key part in 
the analysis since shared VEs almost always involve networks of tele-immersion 
(the exception here is where several people share the same physical space in the 
VE, say, standing together in an IPT). 

The networks we belong to thus extend our relations to the meso- and then to the 
macro-level, where populations are made up of overlapping network memberships. 
On this level, as several chapters in this volume document (Chapters 4, 5, 6), users 
experience what these authors describe as a sense of “community” in different 
ways. I put “community” in inverted commas since this term seems to imply strong 
and positive ties, whereas “networks” is more neutral and also includes the weak 
ties that often characterize CMC [37]. Analyzing networks is useful because it 
allows us to identify the boundaries of networks, and to address issues such as: 
who has access to particular networks, and with what kind of technological 
capabilities and resources? And what is the density or strength, or the weakness or 
diffuseness, of networks? 

This brings us to stratification, and to the larger question which has often been 
posed in connection to CMC and shared VEs, and which Axelsson takes up in 
Chapter 11: whether CMC equalizes the status of participants because of the 
absence of social cues and other status markers? Yet, as Axelsson’s chapter shows, 
in shared VEs, the effect can just as often be to amplify stratification in new ways. 
In shared VEs stratification and hierarchy depend, for example, on the extent to 
which individuals can display their unique status characteristics so that they are 
recognizable by other participants. Again, this depends partly on familiarity with 
the VE: Chapters 3 and 11 give a number of examples where these characteristics 
are recognizable only to certain participants, such as experienced users [20]. 

Again, these social markers are mediated by the frame of interaction: how much 
attention will participants pay to status markers? How much will they trust them as 
reflecting the characteristics of the “real” users? And from the side of the 
environment, this will also depend on the geography of the VE: how much access 
do people have to each other? How do different spaces foster the density of 
networks by bringing people together in a shared VE space, or do they segregate 
them into different worlds and thus promote more differentiated or diffuse 
networks? 
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I have only begun to sketch a framework for analyzing interaction in shared VEs 
and given a few examples of how this framework applies to various findings. Much 
more would be needed to fill in the details of this framework, but these gaps will 
also have to be filled with many more empirical studies which add to our 
knowledge of different types of shared VEs. The study of shared VEs is still at an 
early stage but, as this volume shows, it will be useful to start bringing our research 
together in order to improve the technology and learn from its uses. 
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Chapter 2 

Social Conventions in Computer- 
mediated Communication: A 
Comparison of Three Online Shared 
Virtual Environments 


Barbara Becker and Gloria Mark 


2.1 Introduction 


Recent studies of social processes via the Internet have begun to concentrate on the 
question of whether computer-mediated communication enables people to build up 
social relations with other persons despite geographical dispersion [1, 2]. It sill 
seems to be rather unclear whether the Internet can support the development of 
new forms of social structures, such as virtual communities, which exhibit social 
binding and social coherence comparable to those in real life. Studies that support 
the assumption that computer-mediated communication generates new forms of 
social systems [3, 2] are confronted with a more skeptical assessment, which raises 
the question of whether the variables used to provide evidence for this are really 
valid [4]. Critics refer to the absence of commonly-shared life-world perspectives 
in online communities [3], while more optimistic researchers point out that a 
common background in online environments is generated by communication [5, 6, 
2 ]. 

In this chapter we present a theoretical framework for how the Internet may 
function as a means to bind people socially in diverse locations and with divergent 
life experiences. We discuss the notion of real and virtual communities and list the 
preconditions that must be fulfilled before a group of actors can be regarded as a 
community. As a starting point for investigating this notion of virtual community, 
we set out to observe behaviors in various virtual environments. We chose one 
precondition that we feel can be captured through empirical observation: social 
conventions. However, social conventions encompass a wide range, spanning from 
communication rules which serve to establish a common context for members of a 
community [7] (discussed in Section 2.2), to interpersonal behaviors manifest in 
everyday exchanges which serve to coordinate interaction [8]. The latter are more 
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easily observable, and our empirical findings describe how these interpersonal 
behaviors are expressed differently depending on the virtual environment. 
Returning to our earlier point, that internet technology may influence social 
binding, this leads us to examine further the role that technology plays in shaping 
such behavioral conventions. We contrast two alternative hypotheses: 1) that the 
technology creates a sense of social presence which influences behavior; and 2) 
that people use the available functionality that requires the least cognitive effort to 
achieve their goals. Lastly, we discuss how our results are a building block toward 
the larger notion that communication strategies can be developed through 
computer-mediated communication, and how they can aid people toward 
developing a feeling of group cohesion and individual belonging in these online 
communities. 


2.2 Theoretical Framework: Characteristics of 
Sociai Systems 


2.2.1 Technology and Fragmented Societies 

Modem western societies are characterized by a strong tendency towards 
fragmentation and individualization [9, 10]. Traditional contexts and milieus, like 
social classes or peer groups, no longer function as a kind of environment where 
identity development takes place and where people are embedded in solid 
interaction stmctures. A common ground, developed by general norms or by a 
commonly shared life-world, seems no longer to exist. Plurality and the diversity 
of perspectives are typical characteristics of post-traditional societies. 
Consequently, identity increasingly becomes a product of individual ways of 
inventing oneself. In addition, social binding emerges in different subgroups and 
milieus which form a background for these self creation processes [11] and which 
are often described as incommensurable with each other. 

Fragmented and individualized societies are confronted with the problem of how 
to integrate different perspectives and lifestyles to enable comprehension and 
dialog. Several arguments and positions have been developed to answer this open 
question. On the one hand it is argued, for example by Habermas [12], that general 
norms have to exist which form a normative basis to which every member of a 
milieu can refer even if the lifestyle principles of the specific milieus are very 
different from each other. On the other hand, Lyotard [13] among others has 
proposed that we have to accept the incommensurability of different milieus 
without looking for a kind of general focus or viewpoint. Still others, in the 
tradition of Luhman’s system theory, point out that we need a kind of general 
negotiation system, which allows some kind of interaction between the different 
milieus and social systems by developing strategies and rules which enable them 
to interact with each other and to find transcontextual viewpoints. 
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Even if all these theoretical viewpoints are worth discussing in more detail, we 
would like here to present yet another idea. We assume that interactive media, like 
the Internet, can serve as a medium which emphasizes the development of new 
forms of social binding. According to this assumption, technology is used to 
establish fragile and fluid social structures beyond the diversity and plurality of 
milieus where people come from in real life and despite all divergent individual 
perspectives. Technology forms a kind of communication framework which 
allows, despite all differences in perspectives and lifestyles of the participants, a 
kind of communication which can produce weak social binding on a 
transcontextual level. Following this, we may say that the Internet functions as a 
medium which allows a kind of social integration, because the commonly used 
technology forms the basic framework of communication to which everybody 
refers. The handling of many internet communication technologies is rather 
transparent and easy to learn, so people from different milieus and with different 
capacities concerning their cultural capital [14] are able to use them. Of course it is 
necessary to have the financial capital to access such technologies, which are still 
not available in most parts of the world. Yet for those who have access to the 
technology, the Internet can be regarded as a medium which constructs new forms 
of sociality despite traditional social structures and their boundaries. 


2.2.2 Real and Virtual Communities 

A typical example of such a technologically produced form of sociality are so- 
called virtual communities. By virtual communities, we refer to interactive 
environments such as MUDs, MOOs, and 3D graphical systems. Virtual 
communities may be interpreted as fragile social structures which support common 
and transcontextual viewpoints and perspectives [4] on a global, locally 
disembedded level. These social structures are weaker and more unstable than 
traditional forms of communities because the common perspectives are not rooted 
in a concrete commonly shared life-world. The common viewpoints and binding 
aspects within these virtual environments must be built up by communication again 
and again, so they show a high fluidity and fragility. Furthermore, within these 
virtual social structures, communication has to generate significance and meaning 
which emerges by shared life-world perspectives and inherited perspectives and 
habits [1, 6] in locally embedded social structures or in traditional social forms. 

From this perspective, we looked at virtual communities on the Internet. We 
have chosen collaborative virtual environments as a field of exploration to find out 
how a kind of common basis is generated within these environments and to what 
extent people refer to the same context of meaning when they are entering the 
space. We wanted to investigate how communication creates social binding, and 
whether this kind of social binding is comparable to that in real life communities. 

Before looking at virtual communities in more detail, a further look at 
sociological research about characteristic aspects of communities seems to be 
appropriate. According to a number of sociologists and philosophers [9, 10, 15, 16, 
17, 18] social communities are based on some preconditions which have to be 
fulfilled before we may speak of a community. The most important are: 
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• the persistence of members’ identities; 

• a commonly shared normative basis; 

• the existence and stability of social conventions; 

• a common interest; 

• a collective rationality; 

• being rooted in the same geographical/local place; 

• the continuity of the group. 

The question arises whether these characteristics can be found in virtual 
communities. We have already mentioned that commonly shared viewpoints and 
meaning have to be created in virtual communities in the process of 
communication [5] because they have not emerged by the embeddedness in the 
same life-world, by traditional ways of interacting, by common lifestyle and 
language, or by inherited incorporated habits [14, 19]. Our goal was to observe 
virtual communities to look for the evidence of the existence of such characteristics 
described above, and as a starting point we began by focusing on one such 
precondition, namely, whether we could discover the use of social conventions. 
Accordingly, our empirical research focuses on one aspect: it was our intention to 
explore how this common background is created within these online environments, 
how social conventions are generated by communication, and how the technology 
forms and influences these conventions. 


2.2.3 What are Social Conventions? 

Before looking at three different virtual environments from this perspective, we 
should discuss what is meant by social conventions. Especially in social 
philosophy, social conventions have been described as normative rules of conduct 
which are based on implicit ethical imperatives [20, 19]. On this view, social 
conventions are accepted by group or community members even if they have the 
opportunity to behave in a different way. Social conventions not only determine 
how to behave within a group, but furthermore define some behavior as incorrect. 
Following this, they guarantee the stability and consistency of a social system. 
Normally, a distinction has been made in discussions of social philosophy between 
implicit and explicit social conventions [21, 7]. Some social conventions are 
articulated by explicit agreements, or even laws, which have been established by 
institutions or authoritative persons. However, more often social conventions are 
implicit. They determine the behavior of members of a social system without being 
codified or formulated. Therefore, we assume that an investigation of the use of 
such implicit social conventions would give insight into the social practices of a 
social system, i.e., demonstrating how people behave and act [22]. Furthermore, 
social rules are the underlying preconditions of communication [23, 24] because 
the way people communicate with each other is embedded in social practice and 
specific lifestyles, which are determined by implicit social conventions. On this 
view, social rules function not only for comprehension, but also for coherence 
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within such a system by establishing a common context [19] and a common 
normative basis. 

Research in social philosophy has pointed out that new members of a specific 
social system have to become aware of these implicit social conventions [7, 23]. 
By learning and accepting them, they will be integrated into a group or community. 
Furthermore, one will only be able to communicate with and understand a partner 
if one has gained some experience about these implicit conventions. Thus, if we 
regard collaborative virtual environments as specific forms of social systems, it 
seems to be a successful research strategy to explore the implicit and explicit social 
conventions as a first step toward gaining an insight into the particular social 
practice within such environments. 

Other empirical studies have addressed social behaviors in virtual environments, 
such as the nature of turn-taking and avatar movement [25], dynamics of virtual 
meetings [26], movement in the virtual world [27], experiences fi-om a mixed- 
reality environment [28], identity construction [29], cultural formations [2], 
communication in online communities [16], and observations in text-based MUDs 
[30]. In this chapter we focus instead on the relation between social conventions 
and communication. 


2.3 Methodological Approach and Research 
Setting 

We employ an approach using ethnomethodology whereby, through observation, 
the social conventions which guide the behavior and attitudes of members of a 
social system can be identified. In ethnomethodology, social systems are regarded 
as a net of meaningful behavior, not only governed by formal rules and explicit 
conventions, but which are guided more often by implicit conventions which are, 
to some extent, open, contingent, and flexible. Empirical events can be explained 
through the description of single phenomena that have been observed, rather than 
attempting to identify global structures or formulating general laws. Therefore, we 
concentrated our research on obtaining detailed descriptions of conventions in 
communication to get some insight into the social practices of these environments. 
We selected a set of social conventions to observe that we felt were important 
regulators in face-to-face communication, and which are described in the next 
section. 

Three different online environments were chosen in which to study the existence 
of social conventions: Active Worlds (AW), Onlive Traveller (OT), and 
LambdaMOO (LM). All environments are accessible on the Internet. These 
environments were chosen primarily since they have been in existence for some 
time and offered different functionality for communication and representation and 
thus, we expect, for the expression of social conventions. The main differences are 
that LM is purely text based for both representation and communication, OT has 
graphical 3D representations and offers text and audio for communication, and AW 
has graphical 3D representations and offers only text for communication. The basic 
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functionality available for the representations and communication is described 
below: 

AW; full-bodied avatars can walk and exhibit movements of waving, jumping, and 
dancing, activated by mouse clicks. Avatars can move in three dimensions by using 
the arrow keys. Communication between people is text-based by typing on the 
keyboard. All public messages appear in a scrollable window and also above the 
avatar head with the avatar name for 30 seconds or until the next typed message 
appears. 

OT: the avatars are heads and have four different emotions that one can activate by 
a mouse click: happy, sad, surprise, and anger. The avatars exhibit what appear to 
be random eye blinks. Movement (three dimensions plus rotation in four directions 
- left, right, forward, and backward) is by using arrow keys. Communication is 
audio (outgoing audio is activated by pressing down the control key and speaking 
into a microphone) or text based (pulling down a menu, selecting an avatar, and 
typing a message which appears on the screen). The text is limited to two lines. 

LM: all representation of users and communication is text based. Different 
commands are used for communication (e.g., say, whisper, emote), manipulation 
(e.g., get/take, move), information (e.g., look, who, etc.), and creation (e.g., dig, 
create), as well as others. 

Three different researchers spent time observing three different online 
environments. Approximately 59 hours were spent in total observation time: 21 
hours in AW, 20 hours in OT, and 18 hours in LM. Each observer was primarily 
responsible for making observations in one particular environment, but all 
observers also spent time in each of the other environments to become familiar 
with them. Although the online characters adopted by the researchers varied 
somewhat, most of the time the same online characters were used during the time 
spent in the environments. The observation was performed during May - June and 
October 1997 for LM, and September - October 1997 for OT and AW. The 
observers took notes and recorded behavioral observations under assigned 
categories of social behaviors, described in the next section. The observers met 
periodically and compared observations to make sure that the categories were 
being coded consistently. Online recording and logging were not performed due to 
privacy concerns. 


2.4 Results 

We had chosen a set of social convention behaviors to observe that, according to 
Scheflen [31], serve a regulatory ftmction among actors by initiating, coordinating, 
and closing interaction. The results reported here are part of a larger study in 
investigating social behaviors in virtual environments. For a more detailed 
description see [2]. 
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2.4.1 Contacting Others: Greeting and Acknowiedging 

The first convention we address is that of contacting others: greeting and 
acknowledging. We focused on such a convention since the form of a greeting can 
influence subsequent interaction. Further, in a virtual environment greeting rituals 
can be carried out in a number of ways or they may not exist at all. In all 
environments, informal greetings were regularly used to initiate conversations. Yet 
the ways of approaching and greeting another took on different forms. 

In OT, greetings are usually directed at individuals or to a specific group. The 
greeting is usually audio based and the initiator of the greeting usually repositions 
the avatar to face the other. Greetings are not made when one first enters the world 
but it is often observed that avatars initially scan the scene (by rotating or moving 
around). The avatar then navigates to a position close to another before it initiates a 
greeting. Reciprocity in greetings was also found. If the observer’s avatar is not 
already positioned directly in front of another avatar, the other will turn to face the 
observer, in the same way that Goffman [32] describes as becoming engaged in 
talk through face-to-face contact. In fact, sometimes considerable trouble was 
taken to reposition the avatar. Actors first respond with audio - when the audio is 
working and quality is good. Once, when a person took a long time to respond, he 
apologized saying he was overwhelmed with text messages. Smiles (shown on the 
avatar’s face) were not observed to have an effect in initiating conversation, nor 
were they observed to be reciprocated when they occurred. 

In OT, new contacts are made by moving the avatar to face another and 
addressing the other with audio. When avatars are spatially very far away, they are 
generally not approached by other avatars. This was observed with other avatars 
and tested by the observer who positioned herself far away. The observer received 
several text messages in greeting, but was never approached by an avatar. The 
face-to-face positioning during interaction is a convention also found by Bowers et 
al. [26] in a virtual environment where audio was used. 

In AW, greetings are first made as more of a public greeting, to all in the room 
(but only to a set of avatars who are close enough to see the greeting). Greetings 
are usually made by the person at the time the figure joins the location. Only about 
30% of the time does an avatar move close to another to face it, when a greeting is 
made, and as the conversation continues. Private greetings may be made to 
individuals afterwards, using the avatar name. Reciprocity was also found, but the 
response to a greeting is not from those avatars in closest proximity but from 
anyone within a group of up to 12 avatars or so, and commonly two or three others. 
Gestures for greetings, in the form of an avatar hand-wave, were returned a few 
times when initiated by the observer, but the observer never received a wave from 
another as greeting. 

Similarly, in LM contacts are first made as more of a public greeting, to all in the 
room. The whisper command in LM may then be used, which allows private 
communication. Acknowledgment is also made by only a few in the room. Text 
descriptions of facial expressions and body gestures in LM are sometimes used as 
greetings (i.e., emote smile). These are often acknowledged by others. For 
newcomers, a convention is used, following a description in the tutorial, that one 
announces “Hello, I am new here”. People often offer their assistance in response. 
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2.4.2 Commitment to a Speaking Partner 

In face-to-face conversation, uninterrupted flow is one type of social rule that is 
agreed upon between speaking partners and serves to govern conversation [33]. We 
observed that social behaviors differed in the environments for remaining with, and 
changing, speaking partners. In OT, commitment to a speaking partner was 
certainly influenced by the face-to-face stance of the avatars. The observer noticed 
that she herself felt a social obligation to remain for a short time speaking with 
another, once the face-to-face avatar contact was established. When speaking 
partners parted, generally a farewell was exchanged between the individuals or 
other members of the small group. 

In AW, the avatars did not change their position very often when new contacts 
were made. It was possible to change communication partners by simply typing in 
a new avatar name in the public text window. Speaking partners appeared to 
change more often in AW than in OT, which indicates that less time was spent with 
each partner. Thus, using an avatar’s name in the greeting signaled that the 
message was intended for a specific avatar. It was also observed that farewells 
were said to the whole group. 


2.4.3 Group Interaction Strategies 

In face-to-face conversation, spatial positioning indicates who is clustered in a 
group. A number of social rules exist to govern group interaction: agreements on 
spatial territory [35], the closeness of members [36], and common group behaviors 
[37]. We were interested to see what type of behaviors we could observe when 
actors were conversing in groups in the virtual environments. 

In OT, the avatars’ graphical positions give information about who they are 
interacting with. When a group exists, actors generally welcome someone into the 
group by repositioning themselves to form a circle thereby including the new 
member. One sees by scanning the environment who is interacting with whom, and 
the size of the group interaction. It is rude to simply barge into the middle of an 
existing group. When one approaches a group, the actor generally rotates their 
avatar around to see whether they are blocking another. Sometimes participants 
will pull their avatar far back to see the complete positioning of the group 
members, as a way to compensate for the avatar’s lack of peripheral vision. New 
visitors to OT (who were identified by referring to them) are often characterized by 
their coming into the middle of the group and not looking around. When the 
observer or others did this, it provoked annoying reactions. 

In AW, since actors reposition themselves less often to face another, or to form 
groups, group membership is determined by the text flow in the scrolling window, 
i.e., who is talking with whom. Thus, the visual information becomes less 
important than the text for this purpose. The study by Kauppinen et al. [38] 
confirms these observations, adding that in AW the lack of repositioning the 
avatars can lead to confusion. Sometimes the avatars would be layered on top of 
each other and, with similarities in costumes, identification became difficult. In 
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addition, sometimes the text dialogs, which all appeared in the public window, 
became too complex to splinter up into different group conversations. 

In LM, group membership is often unclear. Only by observing the text dialogs 
can one ascertain who is talking with whom to get an insight into the interaction 
structures within the environment. However, if people are conversing with the 
whisper command, group membership is completely unknown (which can also 
happen in OT and AW when private messages are sent). 


2.4.4 Signaling Privacy 

One of the most common ways of signaling private conversations in face-to-face 
environments is through spatial positioning; speaking partners separate themselves 
physically from others [33]. Chat rooms on the Internet are based on the model of 
physical architecture, offering private as well as public rooms. In the environments 
we looked at, there were also additional methods of indicating privacy: sending 
private text messages. 

Yet we observed that, especially in OT, people took advantage of the graphical 
information in the environment to remain in the same large space and still engage 
in private conversations. For example, two avatars were once turned completely 
upside down to signal that they were having a private conversation (with their own 
common perspective). This was confirmed when the observer (who was right side 
up) approached them, tried to join in, and was not acknowledged. Joint movement 
can also indicate privacy, e.g., moving below the floor to a semi-hidden location. 
Absence of lip movement in avatars facing each other generally means they are 
having a text conversation, and this is often an indication that it is private, since the 
observer was generally not acknowledged. (It should be mentioned here that the 
avatars may also be disconnected from the system, but then the avatars vanish after 
about a minute). Two avatars conversing far off in the distance from the main 
arena also signal a private conversation. 

In AW, avatar positioning is sometimes also used to indicate a private 
conversation. This was observed when two avatars were positioned face-to-face 
and very close together. However, privacy can also be arranged by actors moving 
to another location where others cannot see their text messages. This would be 
done by moving away, or teleporting to another location. Private telegrams can 
also be used (text messages), but this function only exists for paying members. 

In LM, because no visual information about dialog situations exists, people can 
create their own private spaces without being seen by others (using the whisper 
command). Thus, one is not aware of disturbing the intimacy of others. 

In all environments, when avatars are engaged in a private conversation, the 
reaction to any attempt to enter into the conversation is simply to ignore the outside 
party. Privacy could be signaled by visual means, by the positioning of the avatars, 
by changing the communication channel (as in OT), and even by changing 
language, as observed by Kauppinen et al. [38]. 
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2.4.5 Interpersonal Distance 

In the physical world, people maintain a distance from other people during 
interpersonal communication which serves as a zone of comfort. Evidence that 
interpersonal distances were perceived was found in both OT and AW. Similar 
results that confirm these observations were also found in [38] and [39]. 

In OT, positioning an avatar too close to another provoked annoyed responses 
from these actors which suggests that they felt that their social distance was being 
violated. This implies that a perception of such an interpersonal space exists. 
Sometimes avatars in OT moved quickly into the distance as a response, or they 
turned to the side. The reactions to closeness could also be due to the avatar’s 
blocking one’s view. The observers tested this hypothesis by moving close to 
others on the side without blocking their view, but the same reactions were 
observed. 

In AW, similar types of reactions were also observed when avatars came too 
close to one another. In AW, the text above the avatars overlaps when the avatars 
are too close (text also appears in the window below). The comments of 
participants suggest that it is not the text overlap that people are annoyed about, 
since their comments address that a social distance is being violated, e.g., “You’re 
too close, I can’t breathe”. 

In LM, interpersonal distance was expressed through text, e.g., “emote: comes 
close”, but such commands seldom occurred, and no reactions were observed. 


2.5 How does Technology Influence Social 
Conventions? 

These empirical observations reveal a number of social behaviors in virtual 
environments that we consider to be conventions in that they fulfill a regulatory 
function in interaction. Our hypothesis, according to which technology creates new 
forms of social systems beyond real-life milieus, includes the idea that the 
technology itself may influence how social binding emerges within these online 
environments. We assume that the specific media and functionality which is 
available will influence how a common background is generated, which social 
conventions emerge in the communication process, and whether these new forms 
of social binding are stable or not. 

Yet it is not yet clear how the technology might exert an influence. We will 
consider two different explanations that can explain the role of technology in 
influencing behaviors. On the one hand, the technological environment may be 
perceived as a window to a shared space, or a “portal”, as suggested by Kauppinen 
et al. [38], which connects people to each other. Then, depending on the “clarity” 
of this portal that the technology affords, people would perform those actions that 
they would perform normally when believing they are in the presence of other 
people. A second explanation concerns the nature of the technology itself; the 
handling of the specific media, and functionality may lead people to perform 
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certain actions. On this view, people choose the functionality that enables 
efficiency. We begin by discussing the first explanation in more depth. 


2.5.1 Social Presence in Virtual Environments 

Although in many ways we can argue that the conventions in the different 
environments are comparable, the specific behavioral expressions differed. Since 
the existence of such regulatory behaviors suggests that people are trying to 
develop online communities, this raises the question, why are various conventions 
used in different online environments? In other words, although the Internet offers 
a common basis for communication, we observe that communicative acts are 
expressed in different ways. 

One clue here is that all of these environments offer different media and 
functionality for communication, navigation, and representation. Our hypothesis is 
that social conventions in such virtual environments are more socially binding if 
the technology supports a sense of social presence of the other actors. This idea 
refers to social presence theory [40] which states that the nature of the medium has 
an effect on the type of interaction. The stronger the perception of non-mediation 
in the environment, the stronger is the feeling of presence [41]. Social presence is a 
perception of others that is enabled by a particular technology. Presence thus 
becomes an interim variable which mediates interaction and, specifically in our 
study, the expression of conventions. As Short et al. [40] describe, audio-only (and 
text) media fail to convey a number of visual cues present in face-to-face 
interaction, such as facial expression, eye gaze, gestures, and proximity. And 
where important visual cues such as gaze are missing, especially those which serve 
as coordination devices for face-to-face partners, we would expect that in such 
situations interaction would be distorted compared to face-to-face interaction. The 
degree of social presence is determined by how a number of such non-verbal cues 
are conveyed by the medium, and influence how present or distant one feels from 
another person. A high degree of presence suggests the illusion that one is directly 
interacting with another, and the medium becomes less apparent [41]. 

Thus, we would expect that the greater the ability to communicate a range of 
non-verbal cues in a virtual environment, the stronger the sense of social presence 
that would be created. Of course task is a further variable that influences the degree 
that people rely on non-verbal information; for example, problems of an 
intellective nature are generally expected to rely less on non-verbal cues. Yet in the 
environments that we investigated, the tasks were uniform: socializing and meeting 
people, which is affected greatly by non-verbal cues. 

How social presence might be conveyed in these environments is not so clear- 
cut. Table 2.1 presents a summary of the different media and functionality 
available in these environments (complementing the earlier description of these 
functionalities in Section 2.3). 

On the one hand, and based on media research which shows that visual media 
facilitate more presence than audio, and visual more presence than text media [40], 
we would expect that OT, which contains visual and audio media, would facilitate 
more presence than AW, which contains visual and text media and, in turn, would 
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facilitate more presence than LM, which contains only text media. Yet this 
prediction is made more complex by the fact that in all environments, the 
functionality exists to convey some type of non-verbal expression. We see in Table 
2.1 that in OT, people can activate the avatar to show one of four standard 
emotions. In AW, the avatars can also be activated to show one of four standard 
gestures, and in LM, an emote command is designed for expressing emotions. 


Table 2.1 Different media and functionality available in the CVEs observed. 


CVE 

Communi¬ 

cation 

channels 

Represen¬ 
tation 
of actor 

Navigation 

Represen¬ 
tation of 
environ¬ 
ment 

Non-verbal 

cues 

OT 

Visual (3D) 
+ 

Audio 

Visual avatar 

Visual, with 
mouse 

and 

keyboard 

3D graphical 

Avatars have 

a set of 

standard 

expressions; 

eye-blinks 

and lip 

movement 

AW 

Visual (3D) 

+ text 

Visual avatar 

Visual, with 
mouse 

and 

keyboard 

3D graphical 

Avatars have 

standard 

gestures; 

random 

motions 

LM 

Text 

Text 

description 

Commands 
with text 

Room 
metaphor, 
from text 

Emote 

command 


However, the observers discovered that these “pre-canned” avatar expressions 
were seldom used; instead people conveyed emotions and expression through the 
communication media. In OT, emotions were expressed via speech, e.g., laughing 
or with an utterance such as a sigh. In one user’s words, “when you laugh, that says 
a lot”. In AW, emotions were rather communicated with text: e.g. :o) or 
^blushing*. The use of emoticons was common, and they were also used in LM. In 
LM, emotions are also expressed both with the “emote” and “say” commands. 

It is true that in the graphical environments, the avatars show random 
movements, e.g., blinking their eyes, or folding their arms, but the observers 
agreed that after a short time watching the avatars, these movements did not 
convey much non-verbal expression. Thus, according to social presence theory, we 
would still expect that OT actors would experience the greatest amount of presence 
due to the graphical information and audio, that AW actors would have a moderate 
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amount due to the graphical information and text, and LM the least amount of 
presence due to the pure text medium. 


2.5.2 Hypothesis 1: Conventions are Shaped by a Sense 
of Social Presence 

We now re-examine our results of the differences in how online conventions are 
expressed in accordance with how a social presence hypothesis might explain the 
results: 

Contacting others: Greeting and acknowledging 

According to social presence theory, actors in OT moved close to and faced each 
other, but not in AW, because the audio channel in OT created a stronger feeling of 
presence than in AW. Spatial audio forced the actors to come close enough for the 
audio output to be clear, and the sense that the others were present and “inhabited” 
their avatars led people to rearrange their position to face the others. 

Commitment to a speaking partner 

Conventions differed in the environments for changing speaking partners. 
According to social presence theory, the face-to-face stance in OT combined with 
the audio medium would lead people to become more engaged with others in 
conversation in OT compared to AW. And actors were indeed observed switching 
conversation partners more often in AW. Just as in a real cocktail party, people 
may move from one group to another, but social pressures exist for people to spend 
time with another person in conversation, without leaving too abruptly. 

Group interaction strategies 

The careful repositioning of OT avatars to make room for a new member in the 
group’s circle can be explained by a feeling of social presence. In a similar vein, 
the lack of repositioning in AW when conversation partners changed is consistent 
with a lower sense of social presence. In fact, the confused layering of avatars that 
Kauppinen et al.[38] report supports the idea that users in AW do not behave as 
though they strongly believe that their avatars are “inhabited”. 

Signaling privacy 

A sense of social presence in OT and AW would have led people to move away 
from others to engage in a private conversation, since it is impolite to speak 
privately in front of others. However, due to the nature of our methodology, we did 
not measure the exchange of private text messages (which could be done, for 
example, through logging techniques); thus, we cannot judge the amount of private 
conversations in the environments. However, we can say that private conversations 
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did take place in all environments. Perhaps even a weak sense of presence might 
trigger the desire to meet with another privately. 

Interpersonal distance 

Social presence theory would predict that such reactions to violations of personal 
space would be stronger in OT where presence should be greater. In fact, however, 
such reactions occurred in both OT and AW. A closer examination shows that in 
violating interpersonal distance, conversation exchange (i.e., an audio or text 
channel) is not involved. Only moving avatars too close to one another results in a 
violation of the interpersonal space, and this is conveyed through the visual 
channel. Thus, these actions are not contrary to social presence theory since the act 
involves only the visual medium which is the same in both of the environments. 


2.5.3 Hypothesis 2: Cognitive Ease in Handiing 
Functionality 

Although social presence theory accounts for why some conventions are used, it 
does not fully explain how technology might mediate the formation and use of 
conventions. We turn now to an alternative explanation which is parsimonious and 
concerns the interface design. According to this explanation, the interface design, 
functionality and media in each of these environments influence the actors in their 
behaviors. For example, spatial audio in OT would force an avatar to move close to 
another during conversation; otherwise, actors could not hear each other, or they 
must send text messages. In AW, conversation is mediated with text, and moving 
close to another avatar that one is communicating with is not necessary. This 
explanation involves the notion of cognitive ease: functionality is used in such a 
way that it requires the least cognitive effort to reach the goal. This view is based 
on the model of a user who strives to conserve limited processing resources [43]. 

Contacting others: Greeting and acknowledging 

Since the audio in OT is designed for spatial perception, the avatars must move 
close together to hear each other. If actors want to communicate with text and still 
remain spatially distributed they may do so, but with our observation methods, we 
could not determine how many actors were communicating with text. It was the 
observers’ own experience that text was used when the audio quality was poor, and 
even then, avatars usually faced one another. Yet a cognitive ease explanation does 
not address why, since simply moving close activates the spatial audio, actors 
sometimes went to considerable lengths to face one another. Cognitive ease would 
certainly explain why, in AW, the avatars generally did not face each other to greet 
and reposition when they continued the conversation. It was simply less effort to 
write a new name in the text window than move the avatar. 
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Commitment to a speaking partner 

Cognitive ease applies here as it does with greetings. It would predict that in AW it 
is easy simply to change the avatar name in the public text window in order to 
change the conversation partner; and how it is also not necessary to change the 
avatar position. In OT, when one wants to use audio, the actor must manipulate the 
avatar to another location, which requires more effort. Therefore it is easier in OT 
to stay longer with the same conversation partner, since there is a cost involved in 
using the functionality to switch partners. 

Group interaction strategies 

Cognitive ease does not explain why, in OT, the actors carefully positioned 
themselves into group formations. And for the same reason described above in the 
case of AW, it is less of an effort to determine group membership by looking at the 
chat window than by repositioning the avatars to form a configuration that 
indicates group membership. 

Signaling privacy 

According to cognitive ease, it is easy to have a private conversation simply by 
changing the communication medium in OT, e.g., from audio to text. An argument 
against cognitive ease is that it takes more effort to signal privacy in OT through 
graphical means, such as by turning the avatars in a private group upside down, or 
by moving to a distant location. 

Interpersonal distance 

Cognitive ease would not explain why reactions to violations of personal space 
occurred, nor can it explain why a perception of interpersonal distance appears to 
be transferred from the physical environment to the virtual. 


2.6 Discussion 

So far in this chapter we have argued that the presence of social conventions 
supports the notion that virtual environments have emerged as a new form of social 
system for geographically dispersed people. We have discovered that conventions 
exist in all the environments we observed, but that they are expressed differently. 
This led us to explore further different hypotheses for how technology in a virtual 
environment might mediate the formation and use of conventions: 1) that a 
particular technology facilitates a sense of presence that others are really in the 
same shared space; and 2) that behaviors result from handling the available 
technology to navigate and communicate efficiently. 
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2.6.1 The Influence of Technology on the Expression of 
Social Conventions 

In evaluating the two hypotheses, there is no clear overall conclusion. According to 
social presence theory, OT should have afforded the greatest sense of presence, 
leading people to perform social behaviors as if they felt that others were sharing 
the same space with them. And social presence theory does explain the face-to- 
face positioning of avatars in OT, as well as accounting for violations of personal 
space. Especially in virtual environments which offer graphical representations, the 
nature of these conventions suggests that people seem to identify to a large extent 
with these representations. They feel more responsible toward their conversation 
partners as evidenced by, for example, explaining why and when they have to 
leave, and reacting sensitively to the violation of personal space. However, we also 
see differences between the two graphical environments. In OT, people appear to 
behave as though they “inhabit” their avatars, through their care in repositioning 
themselves and facing each other when speaking. In AW, the avatars seem to 
function more as a marker, especially for navigation. Also, the expression of a 
social distance suggests an identification - to some extent - of the physical body 
with the graphical representation. It also suggests that the space in the virtual 
environment is understood and translated as a space similar to that in the physical 
world, one which contains a particular set of behavioral expectations [44]. 

Cognitive ease, on the other hand, makes sense in explaining how conversation 
partners are changed in AW (i.e., in the text chat window). If changing partners is 
as easy as typing in a new name, then it is not worth the effort to navigate to a new 
location to interact with another (as long as one can see them). Considering this 
result together with OT, we therefore propose the following which takes both 
hypotheses into consideration: an environment which conveys a high level of 
social presence will lead people naturally to apply social behaviors that they use in 
face-to-face interaction, and users will try to use the technology to mimic such 
behaviors. If, on the other hand, when this feeling of social presence is low or 
lacking, then there is less social pressure to follow a face-to-face interaction model 
so closely. Conventions do exist nevertheless; however, we argue that their 
expression arises from the amount of social presence in conjunction with how the 
functionality and media can be used in the environment. 

However, none of these explain the most fundamental finding.- that conventions 
exist at all in online environments. For this reason, we argue that the existence of 
conventions supports our hypothesis, that virtual communities have to establish a 
kind of common background to which people can refer, beyond all individualistic 
or milieu-specific differences. This common normative background is established 
by communication to overcome the lack of shared life-world perspectives within 
these environments. Our findings support our assumption according to which 
social coherence can only be built up in online communities if people can refer to 
shared beliefs and common interests. As these do not exist by being rooted in the 
same life-world or by living in shared geographical or intellectual neighborhoods, 
communication strategies like social conventions must substitute for this absence. 
In fact, the use of social conventions are widespread in many internet 
environments, such as in newsgroups, one example being the avoidance of capital 
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letters which indicate shouting. In a survey of newsgroup users, most responded 
positively that they felt a sense of belonging and a feeling of closeness to other 
newsgroup members, which Roberts [42] argues is evidence that a sense of 
community is felt across many types of newsgroups. The fact that conventions 
emerge on the Internet when only a text channel is used, shows how strong the 
urge is for users to establish conventions as a form of regulating and establishing 
common communicative behaviors. 


2.6.2 Shaping Culture through the Virtual Environment 

Just as environmental factors in the physical world shape culture - such as, for 
example, climate, terrain and natural resources available - we should also expect 
the technological environment to shape the culture of its inhabitants as well. The 
design of the virtual environments may contain appropriate metaphors and cues 
that guide users to act in certain ways. When we consider that people often transfer 
the conventions that they use in interaction in the physical environment to 
technology use [34], then the metaphors and cues in the environments can trigger 
the use of specific conventions. As mentioned earlier, in text-based newsgroups 
many linguistic conventions have developed, ranging from abbreviations, to 
determining ways for authenticating user identity and information through writing 
conventions [29]. Thus, we should expect other facets of technology to shape 
culture and influence the formation of conventions as well. A good example of 
what we have seen in this respect is using the technology to position the avatars in 
unusual ways, for example upside down, to signal privacy. Another example is 
switching communication channels to engage in a private conversation. 

The expression of emotion was also, to some extent, shaped by the media 
available. In OT and AW, users were given a choice for non-verbal expression: 
audio (OT), text, or changing the avatar expression. In all three environments, 
emotion was conveyed through the media that provided the most expressiveness. It 
is more direct and natural to express emotion through speech via the audio channel 
in OT - than to activate an avatar expression. Further, speech provides a stronger, 
more individualistic, and more nuanced expression than a “pre-canned” standard 
avatar look. Similarly, with AW, the text medium provides a richer way to express 
emotion than a standard gesture, even when emoticons and linguistic conventions 
(for example, LOL, for laughing out loud) are used. The emote command was used 
quite often in LM to express non-verbal emotion, and its availability may have 
encouraged its use. 


2.6.3 Social Binding 

Our sense is that a feeling of social presence must influence the degree of social 
binding or the aderence to normative behaviors. If we compare these findings with 
the characteristics of newsgroups, we may say that in newsgroups social binding is 
produced by commonly shared topics and interests - while in communities like 
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MUDs and MOOs, this social coherence is generated more by social conventions 
and social presence. 

Undoubtedly certain actors exert a social influence in shaping virtual behaviors, 
since these are interactive environments. The presence of “gurus” and expert users 
in the environments, who often gave helpful advice and tips on using the systems, 
is likely to have influenced behaviors by directing people toward certain 
functionalities. The capability of observing others’ actions, which is more evident 
in a graphical environment, most certainly also plays a role in spreading codes of 
social behavior. 

Our results therefore demonstrate that virtual worlds can become a kind of 
specific milieu [11], including characteristic ways of using language, specific 
modes of interaction and particular ways of getting in contact with each other and 
keeping communication lively. In communication processes, people create a 
specific meaning within these environments, that is, they develop a kind of code 
which is only understandable for frequent participants and which excludes other 
codes. This seems to be especially true for LM, a virtual environment which has 
been in existence for longer than the others. However, we may say that, in general, 
social conventions play an important role in developing a specific code of behavior 
and language which creates social coherence within these online environments. 
People who are coming to these virtual spaces for the first time have to become 
aware of these conventions and also to follow them to be accepted by the others. In 
fact, people report being uncomfortable by their lack of knowledge of the 
conventions. They claim their messages are not taken into account or that they are 
treated as outsiders who have to learn how to behave - as one user put it “....cause I 
didn’t know the ‘in-jokes’ and the current word games”. 


2.7 Conclusion 

We suggest that our empirical findings can be interpreted as an indication that 
computer-mediated communication generates and transforms social structures. 
Even if social binding within these “virtual” social systems seems to be weaker 
than in traditional social systems, there exists some group coherence in these 
communities through shared codified behavior [1,6] which enabled us to use an 
ethnomethodological approach [45]. In addition to other factors, such as common 
themes in newsgroups, this social binding may also be facilitated by social 
presence. In these online environments, communication seems to be possible even 
if individuals are not members of the same social milieu and even if they have a 
different social background. People can, on a very superficial level, begin to 
communicate with each other without having to refer to the same life-world and 
shared beliefs. Accordingly, technology and how it is used form a new context 
which is accessible to people from very different milieus. It enables them to 
understand each other in spite of these differences. With this, technology may have 
an integrative social effect, and we propose that it might even counteract the 
tendencies of fragmentation and individualization in modem societies. 

However, we have to concede that our findings are only a first step in finding 
evidence for the existence of social stmctures on the Internet. If we look back at 
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what we described as typical characteristics and preconditions of social 
communities, it is clear that the social conventions we found cannot be regarded as 
satisfactory proof for virtual sociality. Further studies have to be done and we hope 
that our initial attempt will provoke further research in this direction, and will 
consider our theoretical framework. So, even if it is still unclear whether computer- 
mediated communication may function as an integrating mechanism by clustering 
different social perspectives, we propose that our first findings support the position 
of sociologists like Knorr Cetina [46] who argues that technology is not only bom 
of social systems - but also serves to create and transform social stmctures. 
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Chapter 3 

Living Digitaiiy: Embodiment in 
Virtuai Worids 

T.L Taylor 


3.1 Introduction 

In effect, I suppose I was unknowingly using my second reality as a social 
experiment and it has become very much a learning experience for me. 

Meg, virtual world user. 


Designers, and the code they construct, go a long way toward making a virtual 
world real. They fill it with objects and spaces, properties and behaviors. 
Sometimes they create imaginative scenes only found in science fiction or fantasy. 
Other times they help mirror the offline world by creating more straightforward 
representations of our everyday environments. In each case they significantly 
provide a means of embodiment for the user. For graphical worlds, this comes in 
the form of avatars - those pictorial constructs used to actually inhabit the world. It 
is in large part through these avatars that users can come to bring real life and 
vibrancy to the spaces. Through avatars, users embody themselves and make real 
their engagement with a virtual world. They often push back on the system - 
asking more of it, turning its sometimes limited palettes into something other than 
what was intended. Avatars, in fact, come to provide access points in the creation 
of identity and social life. The bodies people use in these spaces provide a means to 
live digitally - to fully inhabit the world. It is not simply that users exist as just 
“mind”, but instead construct their identities through avatars. 

To examine how digital bodies can facilitate life in a virtual world, I am going to 
focus my attention on a particular graphical multi-user system. The Dreamscape. 
The environment is a “25/2D” world in which the user looks at their avatar from a 
third person perspective. Although it is not a three dimensional space, I would 
argue that it still very much constitutes a virtual environment (as text-based MUDs 
- multi-user dungeons or dimensions - do). Users engage in real time with an 
immersive simulated world in which objects and others occupy the space. Avatar 
bodies (of which there are ten varieties in The Dreamscape - five male and five 
female) can be changed at will by purchasing new ones (both via “inworld” tokens 
or “real” credit cards). Avatar heads, which are separate and different objects from 
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the rest of the avatar body, more commonly operate as the main means of 
customization and individualization in the world. They too can be purchased and 
are often given as prizes or gifts. These heads and bodies can be further customized 
through the use of “spray paints” to change the color of the clothes, skin, and hair. 
Finally, many different accessories (such as hats and jewelry) as well as more 
mundane “daily” objects (like coffee mugs) can be used by the avatar as well. 

In terms of simple communication, both the bodies and the heads contain a range 
of gestures and expressions, some of which are specific to that particular graphical 
representation. Actions or facial expressions are initiated by either clicking on an 
appropriate button or through keyboard commands. Colored speech bubbles, 
containing the typed text of the user, appear above the corresponding avatar’s head. 
Private speech is also allowed through a backchannel method. 

This particular system is one of the oldest graphical environments around, and its 
original incarnation dates back to 1985 [1]. While the number of users has varied 
over the years, the latest figures put average US nightime use somewhere around 
500 (this total is for both worlds running the software). I have conducted an in- 
depth ethnography of the space (including several different “worlds” that use this 
software) which ran over approximately two years and included hundreds of hours 
of participant observation. In addition, I have interviewed both designers and users. 
Doing online research of this sort provides particular challenges, both in terms of 
the multiple mediums at work as well as for questions of authenticity and plurality 
that can be raised [2]. Interviews took place through a variety of formats, including 
email, telephone, and in person. I also conducted a number of group discussions. In 
addition, I participated in two offline “gatherings” in which users came together for 
a weekend mini-convention to socialize and talk about the virtual world. The 
following observation and analysis is drawn from that research. When quoting 
from interviews and personal conversations pseudonyms are used to protect the 
anonymity of informants. 


3.2 Social Life 

When thinking about how social life gets created online and how its attendant 
commimication occurs, avatars are particularly powerful artifacts to consider. They 
prove to be the material out of which relationships and interactions are embodied: 
much as in offline life with its corporeal bodies, digital bodies are used in a variety 
of ways - to greet, to play, to signal group affiliation, to convey opinions or 
feelings, and to create closeness. At a very basic level, bodies root us and make us 
present, to ourselves and to others. Avatars form one of the central points at which 
users intersect with a technological object and embody themselves, making the 
virtual environment and the variety of phenomena it fosters real. 




42 


The Social Life of Avatars 


3.2.1 Presence 

Presence is one of the most elusive and evocative aspects of virtual systems - and 
yet it forms the very foundation on which immersion is built. It goes to the heart of 
what feels “real” and creates the quality of experience that signals to us “I am 
here”. Users do not simply roam through the space as “mind ” but find themselves 
grounded in the practice of the body, and thus in the world. Much like offline life, 
our sense of self, other, and space is constantly reinscribing itself as structures and 
relationships change. In virtual worlds, this same dynamic process occurs - but 
with a twist. The body through which presence is being constructed is not simply 
the corporeal one, but the digital as well. In multi-user worlds it is not just through 
the inclusion of a representation of self that presence is built. It is instead through 
the use of a body as material in the dynamic performance of identity and social life 
that users come to be “made real” - that they come to experience immersion. This 
groimding of presence not only consists of embodied practice, but of embodied 
social practice - and this raises important theoretical and design implications for 
multi-user worlds. Understanding the ways a sense of “being there” is constructed 
as a social phenomenon might alter what we see as central to creating immersive 
systems (i.e., “bigger, faster, and more highly rendered” may not, in fact, be as 
central to presence as once thought). 

In graphical worlds the presence of the user is, at least initially, indicated by the 
images on the screen. While some spaces allow for a degree of hiding (for 
example, in The Dreamscape, “ghosting” allows you to turn your avatar into an 
anonymous “eye” form that sits in the upper right-hand comer of whatever room 
you are in), it is typically the case that you see the avatars in the room and they are 
present to you. It is impossible to forget that a user is in the room in a graphical 
environment — you constantly see their form. The other side to this equation is that 
the avatar comes to signal to the user their continued participation in the space. 
Unlike text-based worlds, in which presence is performed via conscious action (or 
signaled through a room listing), presence in graphical worlds is rearticulated to 
both others and self by the simple inclusion of an avatar. 

Aside from this very mdimentary formulation of presence, activities and games 
which reinforce it abound in The Dreamscape. For example, greeting often occurs 
with a wave or a jump. People will also pace, hide, play tag, or even race with their 
avatars. In all of these ways the digital body is used to root the self in the space. 
This performance is not only for the benefit of onlookers, but it creates and 
confirms to the user that they are, in fact, there. 

Probably one of the most dramatic examples of the way presence is felt in these 
spaces is through an examination of personal boundaries. As Becker and Mark note 
in their work, people typically report having a sense of personal space and body 
boundaries get expressed through the proximity of avatars [3]. Different 
environments have different norms, but users consistently identify personal 
boimdaries and have strong feelings about when they are violated. As one 
respondent put it; 


Placing your body [the avatar] in relation to another is the only 
real form of body language. It speaks to familiarity, to intimacy. 
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to trust.. .to many things. The people who are more casual or even 
completely oblivious to their place in relation to others seem 
exceedingly self-centered to me. 

Users will quite often move their avatars if their personal space has been invaded. 
Crossing boundaries can also be taken (and meant) as a sign of aggression. In the 
following image, one user has “gotten in another’s face” during an online 
argument. The confrontation is played out in the position of their avatars. The text 
above the avatars is their speech. While one group in the middle is carrying on a 
separate conversation, the two people on the right are having an increasingly 
volatile argument. You will notice that they have not only “gotten in each others 
face”, but their expressions are also being used to convey a strong emotion. 



Figure 3.1 Avatar confrontation (Copyright Stratagem Corporation, reproduced 
with permission). 

While avatar positions can convey fighting, they can also signal intimacy or 
fnendship. Sometimes avatar bodies inadvertently touch, but more often than not 
avatars only touch each other if the user feels a friendship or more than casual 
connection with the other person. I would argue that when boundary definitions are 
not present (such as when people stand “on top of’ each other), this is an indication 
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either that users are not fully affiliated with their avatars, or that the interface does 
not provide users with accurate information about their body position. (In the 
image above, Figure 3.1, notice that the three avatars are standing very close 
together. These people are friends and share a kind of social relationship, also 
displayed in their body aesthetics). In addition to platonic friendship, proximity can 
signal other connections. Often users position their avatars right up against each 
other, sometimes with one avatar hiding behind the other person and the visible 
avatar’s back facing out. This signals a private conversation and often a public 
intimacy. 

In each of these examples I want to underscore several points. First and foremost 
is the way in which presence enacts itself as an embodied activity. It is through a 
performance of the body, in this case via the avatar, that one is rooted in the virtual 
environment. There is a material thing (albeit a digital one) that finds itself located 
in a space and moves through it, engaging in some way with objects and with 
others it encounters. In multi-user worlds, the power of embodied presence is also 
quite often directly tied to a practice of presence as a social activity. In this 
formulation, the inscription of self on the space becomes a socially-mediated 
experience. Through action, communication, and being in relation to others, users 
come to find themselves “there”. It is through placing one’s avatar in the social 
setting, having a self mirrored, as well as mirroring back, that one’s presence 
becomes grounded [4, 5]. As one user put it, “Avatar bodies don’t exist in 
isolation. They exist in context”. 

In the best of systems, this practice is seamless and consciously supported. It is 
rare to find such systems actually implemented however. Errors or poor design 
often conspire to disrupt presence. Graphical environments regularly unsettle the 
user’s experience of body and space through glitches in the graphics. Seeing 
people inadvertently walk through walls or suddenly disappear are persistent 
problems in many systems. This feeling of being suddenly pulled back out of the 
virtual world highlights the fragility of multiple forms of embodiment, especially 
in relation to the digital. Systems that rely on only one aspect (objects, to the 
exclusion of the social; or the social to the exclusion of the landscape and artifacts) 
risk constantly unsettling users’ attempts at fully involving themselves in the space. 
Drawing in a range of modes that foster presence is then central to good design. 


3.2.2 Communication 

The actions that initiate and make up presence in the world are closely tied to both 
interpersonal and more social forms of communication (and later, as I will discuss, 
even to identity). While some aspects of presence can be performed and 
experienced privately, a great many are public events. The use of avatars in the 
argument above (Figure 3.1) was both a way of enacting a kind of presence in the 
space and of communicating. 

In addition to “speaking” via the text bubbles that appear overhead, the 
expressiveness of the avatar through movement and facial gestures is also used to 
display emotions and to communicate. This can be seen not only in the individual 
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or spontaneous uses of an avatar’s facial expressions or body language, but in the 
ways avatars can be used to communicate collective sentiments. 

In some instances, groups gather and use expressions to convey feelings, 
opinions, and even protest. Public mourning is not uncommon (generally following 
the offline death of a user) and avatars will often gather together and set their facial 
expression to “sad” - which will then typically guide the discussion amongst the 
group about the departed friend, as well as prompting recent arrivals to inquire as 
to what has happened. Sometimes they will attach URLs to their avatars which 
inform the community of the loss and take viewers to memorial web pages. In 
these instances it is not unusual to find a particular location filled for an entire day 
with avatars looking sad, holding candles, and creating public memorial space. 

This strategy has also been used for protest. The following screenshot (Figure 
3.2) was taken at a rally in which users expressed their adamant desire for “turfs”, 
or personal apartments in the world. The use of shared clothing colors (in the color 
version of this image, all clothes are grey, while the rest of the environment is fully 
colored) and the clear expression of unrest via chanting, gestures, and facial 
expression all convey a strong sense of group solidarity at work. 



Figure 3.2 Protest event (Copyright Stratagem Corporation, reproduced with 
permission). 


3.2.3 Affiliation 

While memorials and protests typically represent one-time events in which avatar 
bodies are used to express collective ideals, more general or longer-term 
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affiliations are also expressed via embodiment. Avatars can become a way to opt 
into, or out of, a group. They can significantly signal affiliation through their color 
choices, bodies, accessories, and heads. For example, one prominent informal 
group in The Dreamscape is that of “vampires” and those who enjoy styling 
themselves with a gothic sensibility. Unlike in a text-based world where a player 
might adopt a particular code to support vampiric role-play, in The Dreamscape the 
performance takes place primarily through the look of the avatar. Dark colors 
predominate and a particular head that has been coded gender-neutral (a 
remarkable feature: this is the only human head that is seen as gender neutral) are 
favorites. Spraypaint that is now rare (no longer available to the general 
population) has been gathered and saved over many years and new group members 
will often be awarded a “rare spray”, thereby coloring themselves with imique 
shades of gray. While the fine distinction between a “rare” gray and a common one 
are likely to go unnoticed by outsiders, those within the group can signal their 
“insidemess” with these kinds of avatar modifications. The special head, worn by 
men and women alike, is also rare and quite costly to purchase in any of the 
inworld consignment shops. Owning one also invokes a measure of status and 
certainly represents a unique customization to the general population. 

Animal heads make up the second most significant category in terms of informal 
affiliations in The Dreamscape. Cat lovers often signal their love of all things 
feline by wearing one of the handful of cat heads available. Since heads can be 
removed at any time and “pocketed”, and another easily put on, it is not imcommon 
to find people switching heads based on particular social situations. Schroeder and 
Axelsson have discussed the challenges that changing avatars can present to the 
maintenance of persistent identity and trust in another online virtual environment. 
Active Worlds [6]. While too much head swapping can be seen as disruptive, 
because of the wider range of components in The Dreamscape for anchoring 
identity - not just an avatar and name as in Active Worlds, but accessories, colors, 
objects, distinct heads and bodies - it is not unusual for a user to come upon a 
group wearing bunny heads, for example, and to decide to put one on to join in. 

In addition to the informal affiliations that bodies can signal, more formal 
relationships can come to be coded in avatars. Not unlike the T-shirt that reads 
“I’m with her”, couples often customize their outfits (through color or accessories) 
and sometimes even match heads to signal their partnership with each other. This 
can also be done through the use of names which are inextricably linked to the 
artifact of the avatar body. A name in this space is not simply something you are 
known by, it is directly “inscribed” on the body. Clicking on an avatar will always 
show the user’s name. People will often choose a name that indicates group 
affiliation or partnership with another avatar - by adopting a name with a related 
theme. 

In turn, people make judgments based on how other users present themselves and 
the kinds of affiliations and identities they express through their avatars. As one 
respondent told me; 


The fact that people do have avatars that I can actually see and 
interact with tells me a lot more about each individual than would 
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a few lines of text in a chat window [...] I maintain that you can 
tell a lot about a person by how they design their avatar, and how 
they move and interact onscreen. 


3.2.4 Socialization 

Beyond explicit signals of group affiliation, people use their online bodies to 
facilitate basic social interaction. They don’t simply chat in disembodied spaces, 
but use their avatars to gather for social events like weddings, community 
meetings, games, and simply hanging out. There have even been instances of 
theater and performance for the commimity to take part in. Joining together in 
rooms and on street comers and sharing space is an important component of social 
interaction in The Dreamscape. While in a “room”, users not only talk to but 
experience embodied others in an immediate way. In Figure 3.3, a group of people 
have joined at a special location to hear the “White Mage” (the person in the center 
of the picture wearing robes) answer questions and tell special stories about the 
history and myths of the world. As you can see, the locale is set up to facilitate this 
kind of gathering, with stones placed aroimd a central point. This arrangement of 
people facing in, rather than out, is unusual for this world and creates a feeling of a 
story-telling circle. People come to this locale and literally gather around a speaker, 
participating in a semi-formal social event. 

There have also been instances of inworld classes, as well as ongoing Bible 
studies and worship services. A very different example of public performance 
using avatars takes place, interestingly enough, during religious online services. 
There was a fair-sized Christian community in The Dreamscape that held worship 
services in which people would raise their avatar arms, dance, and display many 
other forms quite similar to some offline behaviors. Schroeder, Heather and Lee 
have also documented religious practices and sentiments in another virtual world 
(Active Worlds) and the ways avatar bodies are deployed in this setting [7]. In each 
of these cases, performing oneself through the avatar and using it as a vehicle to 
express participation and connection with others has been central to the creation of 
a vibrant world. As users will often say, inworld locations devoid of avatars feel 
empty and abandoned. 

There is also a range of activity and experimentation on the part of Dreamscape 
users when they engage their digital bodies in a playful fashion. One popular game 
is called ghost racing. In it, users change their avatars into the anonymous ghost 
state which is indicated by an “eye” image in the upper right-hand comer of the 
screen. (The image represents all possible ghosted avatars in that location.) A game 
host then puts a token or prize on the ground, and users “unghost” (become 
avatars) as quickly as possible. The first person to unghost and grab the item wins 
the round. Another game involves a scavenger hunt in which users mn around the 
world, looking for objects and clues hidden in it. There are also times when groups 
of people who have the “athletic female” avatar (a body type) will simultaneously 
initiate the “flying” motion, causing their bodies to flap their arms gracefully, float 
up the screen, and then drift back down. In a similar vein, the Japanese version of 
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the world has seen the development of dance teams in which groups of people line 
their avatars up and move them in synchronized fashion for a crowd of spectators 
[8]. Randy Farmer noticed an earlier version of this body behaviour in groups in 
the first version of The Dreamscape (called Habitat). Spontaneous collections of 
performers would often make a “wave team” which would perform “coordinated 
movement, gesturing, and typing, to create a sort of slow-motion dance that looks 
very much like cheerleading on Valium” [9]. 



Figure 3.3 Storytelling (Copyright Stratagem Corporation, reproduced with 
permission). 

Probably one of the most interesting, yet somewhat rare, forms of socialized play 
I have found is body swapping. Since changing one’s avatar costs money, this can 
be an expensive activity (which has a lot to do with its infrequency) - and when it 
does occur - it tends to be a group event. People will get together and, often 
prompted by one or two participants, visit the body-change machine and alter their 
avatars. Quite often this is as much an exercise in some form of gender swapping 
as it is in swapping body type. The experience (both personally and socially) of the 
disjuncture between the gender of heads versus bodies, as well as the types of 
gestures a particular body may provide, will on these occasions generate most of 
the interaction. 

These playful sessions are one of the few social spaces in which overt 
experimentation with gender is seen as a legitimate activity. While people gender 
swap privately (i.e., making an anonymous second character, often unbeknownst to 
friends), these events are unique moments in which people try on different bodies 
and genders publicly and amongst fnends. Often they are simply moments to make 
jokes that rely on stereotypes, but I have also seen them used as instances where 
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people actually talk more seriously about the body types, reflecting on what feels 
“right” or “like them”. The following screenshot (Figure 3.4) is taken from an 
incident of swapping that prompted a discussion about whether or not the “average 
male” body could be legitimately used as a female one. While all the women on 
this occasion converted back to their original avatars, several of them commented 
on liking this body once they tried it. 



Figure 3.4 Body and gender swapping (Copyright Stratagem Corporation, 
reproduced with permission). 


3.2.5 Sexuality 

In addition to these more public social events, people often also use digital bodies 
as a way of engaging in sexual practices. As there are some basic limitations in this 
world, the creativity required to do so is quite amazing. Why do users engage in 
such elaborate and creative plays with their avatars? I suggest that it is because the 
experience and presence evoked in these environments is powerful. As one woman 
put it, ‘When I get an appropriately placed [online] hug, I really feel the rush of 
endorphins”. Thinking about how private interactions around sex occur online 
raises some important issues. People are able to engage with the world and with 
others in ways that link their corporeal body to their digital one. The nature of 
eroticism and sexuality online is probably one of the most underexplored aspects of 
internet experience. When online sex is discussed, it is often done in a 
pathologizing or humorous manner. People who engage in “cybersex” are often 
seen either as lacking “normal”, healthy sex lives, or participating in a form of 
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lying. What I would like to suggest is that the practice of sex in internet spaces is a 
common aspect of embodiment online and that neither the abnormal nor comic 
ways of discussing the matter go far in describing the nature of sexuality in this 
medium. 

Depending on the world, the form of sexual activity varies somewhat. In most 
worlds, however, the textual component of erotic practice is primary whereas the 
avatar itself generally plays a supporting role. Nevertheless, initially the avatar 
often acts as an image that evokes desire and attraction. Despite the arguably 
limited palette of current graphical systems, people remarkably still find particular 
avatars, aesthetics, and styles distinctive from others and uniquely attractive. One 
user, for example, commented that “We can look so good here” and later added “I 
like the look of some avatars”. Thus the bodies themselves act as agents of 
engagement. Stone has suggested that virtual world users “have learned to delegate 
their agency to body-representatives that exist in an imaginal space contiguously 
with representatives of other individuals. They have become accustomed to what 
might be called lucid dreaming in an awake state” [10]. In sexual activity this 
delegation is quite striking, with avatars and language acting both as central 
players, and conduits of, corporeal experience. 

Deploying an avatar via actions for sexual activity is much trickier however. In 
The Dreamscape there is a limit on the range of movement of avatars. The standard 
body gestures are wave, bow, shrug, present, jump, react, and a special action 
customized to the particular avatar. This range of action presents some challenges 
for actually using avatars in any significant way for sexual activity. People tend to 
use the limited range of motion in creative ways and those actions (and the avatars 
themselves) then become place-holders of sorts, signifying something more, 
something that becomes enacted via textual emotes. Users can position their bodies 
such that it appears they are holding hands, kissing, or sitting on one another’s 
laps. Some of the more explicit actions include bending in front of another avatar 
(through the “bow” gesture) or using a gesture which, when done in close 
proximity to another avatar, looks like a pat on the rear. Ultimately the software is 
somewhat prohibitive for the sexual activity of avatars. Moving the avatar with any 
degree of precision can often require using a mouse (and thus suspending typing). 
For many users then, sexual activity is made up of a handful of symbolic or 
placeholder positionings, supplemented by textual emotes and speech. The way 
that language becomes interwoven with the visual symbolism of the avatar is 
particularly interesting here. 

One of the more creative examples I’ve found in the use of avatars for sexual 
activity relates to the long-standing practice of prostitution and stripping in the 
world. The following account was given by a man who found himself 
propositioned to act as a stripper for an inworld party, and how he ended up using 
the avatar and objects to create this performance: 

I bought a cowboy hat and a length of rope at vending machines. 

Since I could carry six items in my pocket, I purchase different 
colored roses to represent shirt, belt, boots, pants and G-string of 
Cowboy Roy [his avatar]. The sixth item was a fern. The act 
consisted of Cowboy Roy moving from place to place about the 
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locale and me describing his fluid movements and general 
appearance. From time to time he would pass too close to one or 
another of the spectators and she would “grab” (be given a rose to 
represent) some item of clothing. Roy would spin away across the 
floor in a teasing manner. Finally, when he was down to his G- 
string and the description of his flowing muscles and tight buns 
etc. was done, one of the bystanders was given the final rose at 
which point he quickly “covered himself’ with the fern. In an 
uncautious moment he danced too close to the person in whose 
honor the party was being given and she was able to “grab” the 
fern away. This was a signal (pre-arranged with the turf owner) 
for Roy to be evicted from the turf and hence go “poof’ and end 
the act. 

He added that when word got around about his show, he was hired on several other 
occasions. What is fascinating about this story, and what makes this a wonderful 
example of embodiment, is the complicated mix of speech and symbolism 
combined with the artifact itself, that must occur for the performance to be 
successful. There is also a way in which, while the designers of the world clearly 
did not build the avatar bodies to facilitate particular kinds of interactions, users 
push back on the system and make more out of it than was originally intended. 
Attention to the sexual lives of users is rarely considered in design, so that practice 
tends to be enacted through a pastiche of multiple types of software (for example, 
augmenting a virtual world with a video feed or using an entirely different 
environment where the digital body has a greater range of freedom), and language 
is used to extend the limits of images. 


3.3 Personal Identity 

While presence, social integration, and communication form powerful aspects of 
embodiment online, identity remains one of the most evocative uses of an avatar. 
Ultimately, digital bodies tell the world something about your self. They are a 
public signal of who you are. They also shape and help make real how users 
internally experience their selves. 


3.3.1 Customization 

The act of changing heads and customizing an avatar is something most users 
spend an enormous amount of time doing. Their avatar acts as a mode of personal 
expression which is constantly being worked over. This activity plays a central role 
in becoming an individual and making the body real. In a space like The 
Dreamscape where avatars are drawn from a fixed library, users will often run 
across others wearing similar heads and bodies. Establishing a unique identity then 
becomes tied up in naming and customizing an avatar. These two processes not 
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only serve a personal function (individualization) but also a social function - it is 
easier to recognize and remember people over time. Instances where users attempt 
to copy exactly another’s avatar (or to use a slight variation on their name) is 
generally taken as an offense. One user told me that there had previously been a 
“rash of impersonations” and that “folks were furious. Folks work long and hard to 
establish an identity, and they get really upset when someone tampers with it”. One 
of the more bizarre twists on this theme was an incident where one of the top 
administrators of the world found out that her avatar (which took the form of a 
robed oracle) had been copied and the graphical representation inserted into a 
competing virtual world by another user. Interestingly enough, it was this incident 
that bought to light a peculiar legal feature of her digital body. She explained, “I 
came to work and found out that I was copyrighted and the way I found this out 
was someone had stolen me and now there was a legal fight over me - my body, 
my head, and my name”. 

Even if users were role-playing or acting with secondary characters that were 
being kept secret from close friends, the desire to make their avatar unique and 
identifiable prevailed. Customization, however, is not simply a pragmatic issue. It 
is not just that users want to be recognizable, but that the look of their avatar, the 
form their digital body takes, becomes tied to deeper questions around identity. 


3.3.2 Getting to “Me” 

Ultimately the question of which body is most evocative to a user is very personal. 
What mattered to most users I spoke with was how much the representation allows 
them to immerse themselves in the environment - how much it feels “right” and 
fosters their connection to an avatar. A large part of this feeling of a body being 
“right” is tied to how well it allows people to construct, express, and perform the 
identity they are seeking. 

The case of Meg provides an interesting example of the complicated ways in 
which digital bodies come to be tied to identity. Meg is a long-time participant in 
The Dreamscape, having been there from the early beta test days. When she told 
me about her initial experience of putting together her first avatar, she expressed a 
common sentiment among participants, that the act of creating an avatar is in large 
part focused on getting to the “that’s me” stage. (Of course, what gets defined as 
“me” often changes over time and with experience, as further discussions will 
illuminate). In this world the process entails trying out different heads, bodies, 
colorings, and even accessories. Throughout her initial experimentation she 
reported that she “couldn’t find a head that suited [her] personality” and that “none 
of the human heads felt comfortable”. As mentioned previously, one of the most 
popular types of heads in the world are the cat heads and Meg, being a “cat nut in 
RL [real life]”, was thrilled to learn that she could “become a cat for a while”. 

The meaning, and effect, of this form of embodiment provoked some fascinating 
reflection for her on the subject of identity. She wrote to me; 


Although it wasn’t a particularly conscious process at the time, 
choosing an animal head instead of a human one was a way of 




Living Digitally 


53 


giving myself more leeway in my inworld actions, and absolving 
myself of some of the responsibility of “acting human”. It was 
also somewhat of a protective measure, a way of not getting too 
close to people until I really knew what I was getting into. 

If people didn’t like me as a human, it would be a definite 
reflection on my waking world self. If they didn’t like me as a cat, 
somehow that wasn’t as serious an issue because after all... I’m 
not really a cat, so it’s not really me they don’t like. 

All those thoughts didn’t go through my head at the time of 
course, but later, when I decided to try a human head for the hell 
of it and found that I felt *very* uncomfortable without my cat 
head. I ended up going back to a cat head for a few months before 
I finally felt comfortable as a human. 

This feeling of uncomfortableness was something I heard repeatedly in 
interviews. Her statement about feeling “finally comfortable as a human” is 
particularly poignant and reveals how avatars can foster different associations and 
forms of self. Another user expressed a similar feeling of being out of sorts, of not 
feeling entirely comfortable in their (avatar) skin, and the anxiety caused by this 
experience. 

I remember wearing the Watson head for a while one day... 
people were coming and going and I was just talking away when I 
noticed I was feeling kind of anxious. The feeling didn’t go 
away... I went afk [away from keyboard] for a few minutes but 
when I came back it was the same thing... and it got worse. I was 
sitting there at the keyboard actually feeling uncomfortable. I put 
the dragon head back on and I immediately noticed I felt much 
better. I have always thought, since the minute I put the head on, 
that the dragon head was “me”. 

In both instances, the avatar head became a central object around which some 
performance of identity was structured. For Meg, the cat head provided a kind of 
“material” from which she was able to create a sense of self within world. It 
allowed a certain playfulness, and, in view of the associations people commonly 
have with certain types of heads, a social connection to others. At the same time, 
this allowed a certain distancing and acted as a kind of boundary device. Once she 
became more familiar with the space and had made more friends, she began to drift 
toward using human heads. She said that as she began to use a human head (“adjust 
to being human”), she was surprised to find that the cat head “didn’t feel 
‘complete’ enough” anymore. 

I queried her about whether she thought cat heads only facilitated play, or if she 
found that serious conversations and connections had occurred while she was 
embodied that way. She replied, “Well, interesting you should ask that because I 
was just about to say that since I became human [referring to using a human head 
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inworld] I’ve found that interactions with others are quite a bit deeper. On my part 
anyway”. While part of this change is due to her own internal reorientation, it is 
worth considering how much the actual form of embodiment can influence 
particular kinds of personal or social engagement. Like all objects, the artifact of 
the avatar is located within a system of meanings and values which will have an 
impact on how it is experienced and received. Meg still has several heads she uses, 
generally according to her moods. 

I have a favorite human that I use the most now. [The] cat and 
lion are more for playful moods. I seem to connect more with 
people as a human and people open up more. Whereas as an 
animal... it’s more of a surface thing. Lots of fun... but not all 
that much depth. 

This use of avatars to engage in different types of social situations, or to perform 
different aspects of one’s self, also extends to the objects and accessories used. 
One respondent described the way she uses two different objects to signal her 
“approachability”. She wrote, “The witch hat was perfect for me once I found a 
head that both suited the hat and my personality, and in my mind it also portrays a 
bit of a mystery as well, a slightly ‘dangerous when crossed’ <g> [grin] aura. I 
wear it when I’m [doing] business and also when I want to appear less 
approachable to those that don’t know me well. When I’m in a more gregarious 
mood I wear the same head with a holly wreath which to me appears a lot friendlier 
and softer looking”. Avatars can thus be reflective material, used to explore both 
one’s inner self and the social world. As Meg put it, “[I] usually change my av 
[avatar] to suit my moods, or to experiment with others’ reactions to different 
appearances, or to see how different looks affect my own actions and comfort 
levels”. 


3.3.3 Avatars as “Truer” Reflections 

In one of the more complicated twists on the subject, some users have even come 
to identify their avatar as “more them” than their corporeal body. One man 
expressed his feeling of what he thinks about when he sees his avatar, Leonardo, 
on the screen facing him and interacting with others. 

I identify this brown cat as me more than I identify my picture 
with me. I see Leonardo more often than I see myself in the 
mirror or anywhere [...] I can’t see “me” in the WW [“waking 
world”] but I can see “me” in the DS [Dreamscape]. When I look 
at the brown cat I know I am looking at me and also that everyone 
else who sees that brown cat also sees me... I like that 
continuity... I take comfort in it. 


This feeling that somehow you not only project yourself into your digital body, but 
that you are actually made most real, most true through it, is something I have 
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heard from a number of users. This is similar to the kinds of phenomena Turkle 
reports on MUD users. She quotes one in particular who says, “So even though I 
play more than one self on MUDs, I feel more like ‘myself when I’m MUDding” 
[11]. What is interesting is how this translates into bodies in graphical worlds. In 
this form, users suggest that the corporeal can no longer “corrupt” the truth about 
who they are and people often say that it was through their avatars that they found 
a “better” version of themselves, one that felt even more right than their offline 
body. 


3.3.4 Experimentations 

While some people make conscious choices to have their avatar reflect their offline 
self and corporeality in some way (or to present a version they feel is more 
authentic), others experiment with embodying themselves in unfamiliar ways. In 
these instances, it is often typical to find people speak of those avatars in the third 
person. One man described it this way, “George and Wendy [two of his avatars] 
both live out different identities and I thought I would lose something important if 
they were conflated”. Another person spoke about the ways he used the space to 
experiment with and think through how different constructions lent themselves to 
different kinds of interactions. Because of the nature of the environment (being 
able to scroll back through conversations to review them, standing somewhat 
“outside” oneself by viewing your avatar from a third person perspective, etc.) a 
level of both reflection and surveillance is introduced that some regard as a unique 
opportunity. As he put it; 

I’m not particularly interested in “connecting” with my avatar, so 
much as I’m interested in the ability to see myself as others see 
me. I’m obviously familiar with the motivations of my avatar, but 
actually SEEING [emphasis his] that behavior, and being able to 
scroll back and review my interaction with others in the group, is 
an opportunity that is rare in the Waking World. 

Interestingly, most users I spoke with made connections between their 
experimentations and their offline identities. The GeorgeAVendy user actually 
suggested that all performances invoke an aspect of experimentation. As he put it; 

Actually, I think role-playing with an avatar is a part of avatar 
construction. The avatar doesn’t come in to the world ready made, 
but rather as a blank canvas. The role-playing is sketching a 
personality on that canvas. I suppose that this is the process by 
which you (unconsciously) decide which facets of the Ravata 
[offline self, “avatar” reversed] get embodied in that particular 
avatar. Some you reject as not really being what you want that 
avatar to be, others get incorporated into that avatar’s identity as 
you perceive it. 
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This process of working through constructing an avatar, sometimes by venturing 
into role-play, leads the most thoughtful of users to reflect upon the kinds of selves 
they perform. As this same user later put it, “I see being an avatar as sort of a long¬ 
term self-exploration and even self-reconfiguration”. 


3.3.5 Avatar Autonomy 

Despite the best intentions of users to actively construct identities, they just as 
frequently report a feeling that there is something about employing an avatar in this 
performance that lies outside of their control. The impression is often that the 
avatar has some independence apart from the user. Ideas about avatars being 
“almost autonomous” are typical. While the avatar may express some aspect of the 
user, people often report a sense that they can’t quite control or predict what their 
avatar will do - what situations or identities will emerge. One user told me, “[Y]ou 
are kidding yourself if you think you will be able to control or even predict what 
will happen to your avatar. It is the ultimate learning experience”. It strikes me that 
these comments touch upon a phenomenon equally common in offline life. We 
exist in social and cultural contexts that often have profound effects on our 
identities and bodies despite our intentions or wishes. It is this social production of 
self and body that I think users are tapping into when they discuss avatar 
autonomy. This experience is simply more pronounced for most users both because 
they may not have such clear examples of this dynamic in their offline life or 
because of the reflective distance provided by an avatar. 

When formulated this way, the bodies and selves people create in these worlds 
have some rooting outside of the user, in the social world. When I inquired of one 
man whether he thought you could change your avatar body and yet maintain the 
same identity he replied, “I’m going to throw you a curve on this one. You CAN 
NOT [emphasis his] maintain the same identity. Avatars have a mind of their own, 
and they grow in unexpected ways”. 

In large part this phenomenon is produced from the fact that the “understanding” 
and social context of any given body may turn out to be quite different than that 
intended by the user. Users may also not anticipate how a particular avatar will be 
“read” by the community. Identities and bodies are not constructed in a vacuum but 
are given meaning, as well as being supported or challenged, in social contexts. 
Avatars often become an artifact that teaches this lesson. One of my favorite 
examples of a user coming to terms with this is from a short article written for a 
community newspaper. The author wrote; 

But I have experimented quite a bit, and the one thing that I’ve 
found most interesting is that people treat you based on how you 
present yourself, and, if you pay attention, you’ll notice that 
*you* change depending on how you present yourself [12]. 

The author went on to describe their experience of being seen as technically savvy 
and the links between that assumption and the use of a male avatar. Interestingly, it 
wasn’t simply that others used the body to support a particular stereotype about 
technical competence, but that the author also felt that this body legitimized a 
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particular identity. The author wrote, “If Cosmocat had been female, ITl bet not 
only would he not be accepted for his skills, but I wouldn’t have felt comfortable 
pretending I knew stuff I didn’t. As Cosmocat, it didn’t matter, I just did it anyway, 
because Cosmocat had guts that my ratava didn’t” [12]. 


3.3.6 Making Sense of Plurality 

The issue of how to reconcile the different selves and bodies we find both online 
and offline is something users are always working though. While most experience 
a lot of pleasure in creating their avatar and experiencing the development of 
identity through it, all users are confronted with having to make sense of their 
immersion. Frank Biocca [13], in his work on virtual systems, has approached the 
tangled mix of bodies we find online and offline by breaking them down into three 
categories - the virtual body, the physical body, and the phenomenal body. He 
suggests that the phenomenal body (our body schema or the “mental or internal 
representation” of our body) is not stable and that “media can radically alter” it. He 
writes; 


It appears that embodiment can significantly alter body schema. 
Metaphorically, we might say that the virtual body competes with 
the physical body to influence the form of the phenomenal body. 

The result is a tug of war where the body schema may oscillate in 
the mind of the user of the interface [13]. 

I have consistently found users right in the middle of this tug-of-war. Their avatars 
and online identities often seem to have real import to their offline life, and they 
also fluctuate in their use of third and first person language to describe their 
experience. Often they move between feeling that the avatars are simply an 
extension of themselves to feeling that the avatar and its life are very much “not 
like them.” Yet the other important thread Biocca is pointing to is the way in which 
experiences in virtual worlds can actually reshape users’ sense of their bodies (and, 
to take the argument further, their selves). As one user recounted when talking 
about the relationship between his avatar and his corporeal body, “When my 
arthritis is not acting up and I can move freely, I sometimes lengthen my stride and 
move purposefully through malls and shopping centers the way I think he [his 
avatar] would”. 

Of course, not all users have these deeper experiences from their time online. 
One woman stated it clearly when she said, “the connection between our offline 
selves and our avatars is a lot more meaningful for some than it is for others”. 
However, when they do engage with the space actively, people report some 
fascinating things. She went on to say that “the more time I spend inworld...the 
harder it is for me to differentiate between my inworld self and my offline self [...] 
The two seem to be merging with each other and it’s actually a pleasant experience 
for the most part”. Conversely, role-play also provides interesting benefits. As one 
respondent said, “That, of course, is the beauty of virtual reality worlds. I can try 
out behaviors that I am afraid to try in the real world and see how they feel. Then I 
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either take them back to the real world or discard them as unworkable”. This 
ability to shift is made real through the material of bodies in the spaces. As she 
went on to say, “I think the animal heads remind me that I am free to be someone 
entirely different”. 


3.4 Bodies with Limits 

The people who are working through these issues, working with the material of 
digital bodies, face some of the most complex ideas that new media present to us. 
They raise questions about what our bodies are, who we are, and what we can be 
“virtually”. As often as not users are trying to sort through what it means not only 
to have distributed bodies and selves (that matrix of offline and online creations), 
but what to make of those instances where it feels like their avatar has taken on a 
life of its own. They are sometimes challenged by how new forms of embodiment 
push them to think about their corporeal bodies. Or again, they sometimes find, or 
create, an aspect of themselves that was previously unrealized. Ultimately, these 
moves raise the stakes on what the nature of these spaces are. If I can embody, I 
can be made deeply real. 

However, if we accept the powerful role embodiment plays in helping to foster 
identities and social lives, then we must attend to the ways avatar systems also 
often limit and constrain interesting and progressive possibilities. As I mentioned 
early on, presence is regularly undermined by poorly executed systems, or those in 
which designers have not paid full attention to the complex ways bodies can be 
formulated. Virtual world systems also carry design decisions which reflect deep 
links with particular world-views and value systems. Quite often worlds carry 
explicit visions about how the space should operate and what kinds of citizens 
users there should be [14]. While the social performance of gender can take on 
fascinating nuances as users “rewrite” objects and avatars for their own ends. The 
Dreamscape, for example, continues to operate within a very specific gender 
dichotomy which will always inform and structure the possibilities for identity in 
particular ways. 

Beyond the structural limitations on avatar bodies, social ones exist as well. 
Nakamura [15] has given us an important view of how this works in text-based 
spaces. She found that often the kinds of experimentation in which people are 
engaged amount to a form of “identity tourism” in which users were not involved 
in progressive explorations of self construction but instead relied on stereotype and 
caricature that allowed a kind of unreflective appropriation. Underlying these 
performances were assumptions about what kinds of bodies and identities were 
deemed as legitimate. 

Such questions appear in graphical worlds as well, at least when the structure of 
the software doesn’t foreclose them. For example, the performance of queer 
identities (and bodies) is often quite contested in such spaces, either publicly or 
privately through anxiety about the “real” gender and sexuality of another user. 
This is probably one of the trickiest areas for people to come to terms with and the 
conflation of gender with sexual preference becomes quite complicated. Even for 
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those that limit their sexual encounters with other users to online interactions, the 
question of the importance of offline gender can be confusing. While some people 
regard offline gender as unimportant to online attraction (as in the case of the user 
who said, “If you have no intention of taking a relationship into the Waking World, 
why does the actual sex of your partner matter? If they turn you on, they turn you 
on”.), others speak of the caution, anxiety, and trust that must be given over in the 
hopes of not being “duped”. 

The anxiety around sexual orientation is one area in which we can see that all 
body performances are not weighed equally in virtual worlds, and we should be 
cautious in over-stating the “freedom” such spaces afford. In the following 
account, users are gathered in a public space to participate in something called an 
“Avatar Auction” in which users bid on inworld “dates” with other users. One of 
the participants was an openly gay man and he began by playfully teasing about the 
underlying assumptions in the event. 


Michael: WOMEN ON MEN? 

Michael: ICK 
Starchild: Shh 

Starchild: [giving instructions to the participants] If all women 

would please ghost and all men come down 

Michael: You gonna let guys bid on guys and women bid on 

women? 

Bluebird: haha 
Starchild: No 
Michael: That really sux 
Michael: Bigot 

Michael: I’m going to go get my female avatar! N’ya n’ya 
Starchild: LOL 

Starchild: [after Michael left] He is strange 

Though Michael started out teasing and later jokingly proposed an interesting way 
of overcoming the constraints (one that only highlights the ambiguity!), he later 
told me he was very unhappy with the organization of this event and actually 
complained to the world management (who took no action). Interestingly, the 
software used for The Dreamscape was also licensed by another company for a 
virtual world specifically for gays and lesbians. Called Pride! Universe (and later 
Queery), it explicitly legitimized particular identities and bodies and offered an 
interesting alternative to many other graphical worlds. Unsurprisingly, Pride in 
many ways came to be regarded as the progressive counterpart to The Dreamscape. 
While there were users who were in both worlds, because of the monthly fees for 
each, people generally had to make a choice as to which space they were going to 
give their energy to. In the long run, the division of worlds in this way (one 
explicitly gay-friendly and one in which a real diversity was often limited) had a 
certain kind of cost. The larger question about what bodies and identities are 
legitimated in any given system got answered (unsatisfactorily, I would argue) 
through the idea that some forms were more or less “appropriate”. This strikes me 
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as not dissimilar to the story Julian Dibbell recounts about LambdaMOO’s Schmoo 
wars. In that instance, the very question of legitimate sexual activity, and in turn a 
particular form of embodiment was, at least in part, at the heart of the community 
wrangling over a bit of code [16]. Pride has recently closed, and while the reasons 
are complex, there has been a subsequent loss in the range of embodiment and 
identity users can perform, at least publicly. 


3.5 Conclusion 

The limits, both structural and social, on the kinds of avatars people are able to 
create and use is important for our understanding of embodiment in virtual worlds. 
Avatars are in large part the central artifacts through with people build not only 
social lives, but identities. They become access points in constructing affiliations, 
socializing, communicating, and working through various selves. They are the 
material out of which people embody and make themselves real. What they are and 
what they can be matters. 

While a good portion of what has been written about virtual environments so far 
has focused on communication or identity, I hope to have shown how, in specific 
ways, the avatar as a body is woven into the structure of life in these worlds. It is 
through embodied practice that selves and social life are grounded in multi-user 
spaces. Mikael Jakobsson proposes that we should take virtual objects and spaces 
seriously. He has argued that “the inanimate objects of a VW [virtual world] are as 
real as objects in the physical world although different” [17]. The “symbolic 
significance” (as he puts it) that digital objects carry with them lend themselves to 
real relationships, interactions, and values. Certainly we must include avatars, or 
digital body objects, in this category. 

This realness of pixels, the materiality of bodies online, and the importance of 
experiences in these worlds continue to be subjects that users of all types and 
virtual world participation wrestle with. Researchers and theorists should consider 
how simple divisions of “virtual” and “real” may not prove to be very useful in 
accurately explaining what happens in multi-user environments. Instead, we might 
see what happens when we broaden our notions of embodiment to include both 
corporeal and digital forms. Given the slippage between on- and offline life, the 
stakes are high in sorting through these questions. If, as one user told me, “being 
inworld has actually affected my waking world life quite a bit”, then what people 
are in these spaces, how they are embodied, and what they can do, become central 
to thinking critically about life online. 
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Chapter 4 


Rest in Peace, Bill the Bot: Death and 
Life in Virtual Worlds 

Mikael Jakobsson 


4.1 Introduction 

You are about to read a story of crime, deceit and pimishment. The story takes 
place in a virtual world, but it is by no means a fictional story. All the characters 
portrayed have existed and the events recollected have actually happened. They are 
taken from the real everyday life of a virtual world. A virtual world is a virtual 
place that is persistent over time - unlike the environments of networked games 
like Quake - and it is accessible by many people at the same time. These people 
have to have some kind of self-representation, so participants can see each other, 
unlike the simultaneous visitors of a website. By calling it a place, I have implied 
that the system has to be based on some kind of spatial metaphor, unlike an 
electronic message board for instance. It can be a text-based system - but in this 
case it is graphical. My reason for telling this story is to point out some important 
aspects of the nature of the social interaction in this kind of setting that are easily 
overlooked by anyone who does not have extensive first-hand experience from 
participation in social virtual worlds - but I will save my analysis for later. 


4.2 Setting the Stage 

Shakespeare told us that “all the world’s a stage”. The world that comprised the 
stage for this particular drama was a virtual world that was my own creation. I had 
named it the Virtual MIT House after the Mathematics and Information 
Technology House at the Umea University campus where I worked. The world was 
used in teaching at my department, for meetings between physically distributed 
researchers, and between researchers at the department and their families when 
they were abroad on travels. It was also used for recreational purposes. I 
encouraged different kinds of activities in my world in order to be able to conduct 
ethnographic studies in my own backyard, as it were. The Virtual MIT House was 
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part of a virtual universe called the Palace. As of January 2001, the company 
running the official Palace servers shut down their services, but since the Palace 
technology is distributed, other Palace servers still work and many servers are still 
in operation. 

The Palace is a system for making and using two-dimensional graphical worlds. 
In these worlds, people are represented as small images called avatars 
superimposed on often cartoon-like backdrops. The citizens in these worlds belong 
to different classes. At the bottom of the hierarchy are the guests. They are 
restricted to using avatars that look like smileys and they are assigned a new 
generic name every time they enter a Palace world. Registered users become 
members and can choose their own name and appearance (see Figure 4.1). 



Figure 4.1 The Palace. 


To help me in the continuous development and care-taking of the world, I had 
two wizards whom we shall call Neo and Trinity. They were both American 
teenagers who spent much of their time in the Palace. I had given them wizard 
status, which means that they could add and delete rooms. They also had the ability 
to enforce the rules of social conduct among the members and guests, including the 
use of force if they found it necessary. They could, for instance, make a person 
unable to speak or move, or kick a person out altogether. 

As the owner of the server, I automatically enjoyed the highest set of privileges 
in the world. I was a god. Besides having all the powers of the wizards, the god can 
also turn the world on and off, and appoint new wizards or retract wizard privileges 
if a wizard does not behave according to his or her liking. The god, in other words, 
as the name clearly implies, is an almighty ruler of the world. 
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Apart from my two wizards, there was also a third member on my staff. His 
name was Bill and he was a bot, i.e., a character controlled by a computer script 
instead of by a human. Bill served as my bartender. In addition to serving beer, he 
could also answer questions posed to him in a more or less intelligent maimer. The 
fact that Bill had a distinct resemblance to another well-known Bill gave an added 
satisfaction to being served by him - since I sometimes felt a bit like a servant to 
this other Bill. The resemblance also served as a basis for the humor I had tried to 
code into his responses to the questions people asked him. If, for instance, someone 
asked him whom he loved, he would answer that he loved only money. And if he 
was asked if he knew what time it was, or any other question containing the word 
“know”, he would answer “No, but I know all the shortcuts in Word”. 

There are two more main characters in this story. As with my wizards, I have 
given them fictitious names to protect their identities. I have called them Bart and 
Lisa. I will not give them any further introduction here. Instead, I will begin telling 
my story. 


4.3 Killing Bill 

One day when I entered the Virtual MIT House I found to my astonishment that it 
had been severely vandalized. Someone had apparently got hold of the wizard 
password and used it to delete parts of the house, alter other parts, and write some 
very unflattering remarks on the walls. The bar was one of the rooms that had been 
deleted, and since the script and graphics that constituted Bill were tied to this 
room, he was also gone. Thus I had a potential virtual murder case on my hands. 

I was more than a little irritated when I realized that Bill was gone, but above all 
I felt curious. What had happened and why had it happened? I decided to initiate a 
little investigation into the incident. I started with a look at the server log. 

The system keeps track of some of the activities in the world such as when 
someone enters or leaves the world, tries to attain god or wizard privileges by 
entering a password, or when someone tries to add, extract or change something in 
the world. These activities are recorded in the server log together with the 
nickname, Internet Protocol (IP) number, and time. As one might expect, the two 
perpetrators had not used their ordinary names, but the log still held the key to their 
Palace identities. The break-in had been committed by returning visitors, and the IP 
number uniquely identifies a computer connected to the Internet. So all I had to do 
was to match the IP numbers from the break-in with the rest of the server log to 
find the names they ordinarily used. 

Judging by these names they were probably a boy and a girl. I will refer to them 
as Bart and Lisa. I vaguely remembered Bart from a treasure hunt that my wizards 
had arranged in the Virtual MIT House. This had been one of the occasions when I 
had used video to record the interaction, so I even had him on video! To see myself 
happily chatting with the person I suspected of vandalizing my world was a weird 
sensation. It felt a bit like seeing a bank robber caught by a surveillance camera 
while scouting out a bank before a hit. Although I had talked to him on several 
occasions and had him on video, I did not have his email address and could not 
think of any way to get in touch with him. 
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As for the other perpetrator, there were two frequent visitors that used the 
internet service provider through which Lisa had connected during the break-in, 
and one of them was Neo! Could it be that my wizard sometimes logged on as a 
female and sometimes a male, and that he was in fact Lisa? That would explain 
how the vandals had been able to log on as wizards, but there were some things 
that did not quite fit. Lisa seemed to be a girl and Neo was a boy, and the log gave 
the impression of a person that was inexperienced in the role of wizard, while Neo 
was a real expert. 

However, all of this could of course be just a clever trick from Neo, so I emailed 
him to tell him what had happened and that he was my prime suspect. I figured that 
if he was not Lisa, he should at least have some information as to who she was. I 
also told him that I was going to put their whole domain on the ban list, thereby 
denying all users of that internet service provider entrance to the Virtual MIT 
House. He answered that he was sorry about what had happened, but that he knew 
absolutely nothing about it. 

My investigation had reached a dead end, just like in so many detective stories. 
However, as in many stories, that was when I got help from a very unexpected 
source. I received a mysterious letter (in the form of an email) that read, “Hi Mjson 
I am Fred, Neo’s father. He uses my email, so I have decided to become involved 
in this issue. I met Trinity the day before Neo did and she was going around 
offering a lot of people the prospect of wizardship at V MIT. One of those people 
was Bart!!!”. 

This story takes place in a time (some five years ago) when not every teenager 
had his or her own email address. So Neo had been using his father’s address in his 
communication with me, and apparently his father had kept an eye on our 
correspondence. He was himself an avid Palace user and, although he had not got 
all the details right, his information was crucial to cracking the case, especially the 
email I received three minutes after the one quoted above. It simply read, “One 
more note. Lisa is Neo’s sister”. 

Fred later mentioned that it was when I threatened to cut off the access to the 
Virtual MIT House for the whole family that he decided it was time to step in. He 
explained that Lisa was actually Neo’s eleven-year-old little sister and that she had 
probably not been fully aware of the results of her actions. What I initially had 
believed to be a murder turned out to be more of an involuntary bot-slaughter. I 
made a deal with Fred to let him take care of Lisa’s sentencing. He cyber-grounded 
her for one month and asked me in return to refrain from banning the whole family. 
Fred also gave me some information as to the reason for this incident - that Lisa 
had more or less just gone along for the excitement. The brain of the operation and 
the key to this mystery was Bart. 

I felt a bit like I was holding up a prize catch to the camera when it had suddenly 
slipped out of my grip and back into the water. I needed to get hold of Bart and it 
was not going to be easy. Whenever I saw him he discreetly left the world we were 
in. Once again I needed help, but this time I had a better idea about where to get it. 

I started by meeting with one of the wizards of the main world of the Palace 
universe. He promised to help me organize a multi-world ban of Bart if he did not 
turn himself in. While Bart surely could stand not being able to visit the Virtual 
MIT House, a ban that would in effect keep him from meeting any of his Palace 
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friends would be a very potent punishment. The next step was to spread the word 
about the meeting and my plans among our mutual Palace friends. My plan 
worked. It did not take long before he came to see me. 

Bart’s explanation for his actions was that Trinity had promised to make him 
wizard in the Virtual MIT House. When he found out that this had been an empty 
promise he wanted revenge. His plan had initially been to destroy things only that 
Trinity had made. Yet things had got a bit out of hand and some additional 
property had been damaged. He seemed absolutely terrified by the risk of getting 
banned from all the big Palace worlds. He said that he preferred to be grounded 
physically to getting cyber-grounded. After all, the virtual world was where he had 
most of his friends and where he spent most of his free time. My feelings towards 
Bart had up to this meeting been annoyance rather than anger, but when I met him, 
I realized for the first time that I was not completely without blame in this affair 
myself I had taken on my wizards in a rather random fashion and had probably not 
bothered to be very clear about the fact that I did not have any intention of bringing 
on any more. To me this wizard business had never been a big thing, but to some 
of the people out there it had been an important career opportunity. 

I had become a God by installing and executing the server software. To me the 
responsibilities of being a god were mainly technical in nature; I had to keep the 
server running. However, I also had the power to appoint people to important 
positions within the little community that had emerged and to punish inappropriate 
behavior. By controlling the technical system I had been given responsibility for 
the community without fully realizing it. I had underestimated the social 
responsibility of being a god. I decided to go easy on Bart and only ban him from 
the Virtual MIT House for one month. We were both very satisfied with the 
conclusion of the whole affair and shook hands before leaving the meeting. 


4.4 Analyzing Virtual Worlds 

The rest of this chapter will use this story to discuss some basic characteristics of 
social virtual worlds. I would like to begin at the end by reflecting on Barfs 
comment that he would rather be shut out from the physical world than from the 
Palace worlds. This comment stands in stark contrast to numerous studies claiming 
that computer-mediated communication is very limited compared with face-to-face 
communication. In an attempt to make sense of this contradiction, I would like to 
take a look at some of the underlying assumptions of these studies. 

Experimental psychologists performed a number of studies comparing face-to- 
face communication with different kinds of mediated communication in the 1970s 
(see [1] for an overview). The explicitly stated purpose of these studies was often 
to look for negative psychological effects from the use of communications media. 
Face-to-face communication was used as the “gold standard” that mediated 
communication had to live up to or be rejected. The idea that mediated 
communication could also have social or psychological advantages over face-to- 
face communication was not even considered in these experiments. Instead, their 
starting point was to decide which types of meetings could safely be electronically 
mediated, and which ones had to be performed face-to-face. 
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These experiments were based on a positivist approach to social science, which 
meant that the effects of using communications technology had to be tested in 
laboratory experiments in order to be controllable and reproducible. This 
sometimes led to situations bordering on the absurd, such as having research 
subjects holding conversations wearing cardboard masks to test the importance of 
nonverbal facial cues [1]. In taking communication out of its context, these 
experiments also failed to take into consideration that different forms of mediated 
communication normally require a period of adaptation to the medium before the 
participants can use it to communicate successfully. Moreover, the experiments 
were mostly about problem solving, information finding or decision making - and 
measured quantitatively with speed and efficiency as success factors. This again 
relates to the preferred research method, but also to the expected context of future 
uses which were strictly work-related. 

This positivist, decontextualized, and work-related view was also shared by Daft 
and Lengel [2] in their formulation of media richness theory. They argue that the 
communication richness of a medium is an invariant and objective property of 
communications media, and once again they rank face-to-face communication as 
the richest medium. They also state that their theory was originally formulated to 
help address issues of information processing in organizations. Media richness 
theory has had a strong impact on studies of computer-mediated communication 
and within the field of information systems research during the 1980s and early 
1990s, but its popularity has recently experienced a decline [3]. 

One of the clearest findings of the studies of mediated communication in the 
1970s and 1980s was that mediated communication might be a good way to 
conduct formal meetings among people who already know each other, but it is an 
inadequate way for people to share emotional content, let alone develop 
meaningful and long-lasting relationships [1,4]. Based on these studies, the lack of 
non-verbal cues, such as body language and speech intonation, would make it 
difficult to convey complex emotional content using a system like the Palace. In 
addition, having to type everything you want to say would make communication 
inefficiently slow. 

Since the Palace is a graphical system based on a spatial metaphor with graphical 
representations of the participants, it is also possible to use it for different types of 
interaction beyond communication. Widening the scope from communication to 
interaction, we can note that since we cannot make much use of our physical body 
in the Palace, the gap with the gold standard of physical interaction becomes even 
wider. It seems clear from this research that this medium is so limited that it is 
confined to offering a second-rate copy of interaction in the physical world. So 
why would someone like Bart choose the Palace over social interaction in the 
physical world? 

With my background in informatics, my original research interest in virtual 
worlds was geared towards how to design them. Yet I also believe that good design 
must be based on a fundamental understanding of these worlds and the social 
interaction going on within them. And I must admit that I myself had a hard time 
understanding what was really going on in virtual worlds when I started visiting 
the Palace. Thus I decided that what I needed to do was to try to put my own 
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preconceptions aside and just hang around to try to form a basic understanding of 
what made this social system tick. 

I wanted to form an inside view of the phenomenon before I even tried to 
formulate which research questions might be interesting to pursue. My approach 
stands in contrast to the traditional research approach to mediated communication, 
which takes an outside position. The inside view is both about adopting a 
contextualized approach and about actually stepping into the medium that I am 
studying. By spending time on the inside, I started to see the solution to the 
paradox of the amiability of this poor interaction medium. 

The first thing I realized was that Palace participants do not want something that 
adequately mimics the face-to-face interaction of the physical world. The whole 
point of the virtual world is that it is different. The fact that you do not have to 
reveal your face and body to the people you are interacting with is a core feature of 
the Palace and a fundamental influence on the social interaction. It certainly does 
have its drawbacks. It is, for instance, harder to tell if a person is ironic or sincere 
without cues from intonation, facial expression, etc. Nevertheless, the opportunity 
to present oneself to others as a graphical image of one’s own choice is clearly 
very compelling to many people. The possibility to conceal unwanted cues such as 
blushing, stuttering, or talking with an accent is never considered in quality 
estimations of mediated interaction, and is therefore lacking from the outside view. 

But can the possibility to hide the physical body behind a digital image really be 
something good? Isn’t this deception? Yes, it certainly has an element of deceit, 
but so has wearing clothes and make-up, for instance. Goffman [5] has shown that 
we constantly put considerable effort into presenting ourselves to others in a way 
that we hope is as favorable as possible. In doing so we will typically take on a 
number of different roles; one for holding presentations, one for chatting in the 
coffee room, etc. 

Virtual worlds give us the opportunity to take on yet another role, a role that has 
certain properties that face-to-face interaction can rarely have. I know from my 
own experiences that playing this role can relieve tension from otherwise pressing 
situations, and I also know from my interviews that it can help people with 
different disabilities to interact with other people without standing out, or feeling 
pitied. And, not least, it can actually be quite a lot of fun. Having said this, it must 
be added that this particular feature of virtual world interaction can also cause 
problems. Many of these stem from difficulties in maintaining the distinction 
between the presented self of another person and what dwells behind that 
presentation. 

This effect, according to Reeves and Nass [6], applies to a wide range of new 
media. They found, as the title of their study indicates, that people generally deal 
with media with human-like qualities as if they were dealing with actual people. 
We do, for instance, have a tendency to unconsciously treat a computer politely if 
it asks questions politely - although we know perfectly well that it cannot feel hurt 
or insulted or even understand our replies. In the same way, we start thinking of a 
conversation partner as actually having some of the properties of his or her avatar. 
Just as with movies, we do not have to make a conscious effort to suspend disbelief 
before what we see can captivate us. Once we start chatting and moving around 
with our avatar, we are there until we consciously tell ourselves otherwise. 
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In addition to disregarding the new possibilities of interaction, traditional 
research on mediated commimication also underestimates the ability to adapt to the 
medium and work around the problems that it imposes. From the inside, it becomes 
apparent that it is necessary not only to change how we do things, but also what we 
do. Here is an example of an activity which was triggered by the inherent 
affordances of the system: one day a person changed her avatar name into a short 
sentence, so that it was shown beneath the avatar where the name would usually 
appear. I do not remember exactly what it was, but it was something silly like: 
“Fm with stupid ~>”. She then positioned her avatar next to her friend’s avatar. Of 
course, the friend responded by putting something like “Me too!” in his nametag. It 
was not long before the whole room was a long chain of avatars jointly creating 
elaborate sentences using the text space intended for their names. 

This example appeals to me because it not only shows that virtual worlds have 
unique properties that can and will be woven into the interaction, but also shows 
how the participants of a world take part in the construction of their environment 
by extending the uses of built-in features of the system beyond the original 
intentions of the designers. Most importantly, it shows how the possibilities of a 
system, in this case a creative collaborative environment, emerge when people 
appropriate the system in a way that is very easily overlooked when making a 
priori assumptions about what a medium can be used for, such as work-related 
meetings, before starting to study it. 

Both Neo and Trinity as well as Bart and Lisa had tried the Palace and found a 
place where the social interaction in many ways was different from the one in the 
physical world. Instead of worrying about the fact that problem solving was slower 
than in face-to-face interaction, or that non-verbal cues were harder to get across, 
they started to experiment with the system and found qualities that were unique to 
the medium. That was why access to this social arena was so precious to Bart. 


4.5 A Game of Life? 

A common conviction among people who have no first-hand experience of virtual 
worlds is that it does not really matter what happens in a virtual world because, 
after all, it is not for real. “It’s just a game.” I don’t have a problem in 
understanding this attitude. After all, seeing someone engaged in interaction in a 
virtual world looks very similar to seeing someone playing a computer game, and 
the large virtual worlds systems available today tend to borrow their aesthetics 
from popular culture, such as cartoons or science fiction. However, appearances 
can be deceptive. In a virtual world, sticks and stones can’t break my bones, but 
that does not mean that I would take no notice of someone trying to throw stones at 
me or beating me with a stick. My mind and my emotions are present, and virtual 
actions can work as causes of effects on my mental state that are as real as anything 
I might experience in the physical world. Dibbell [7] makes this point very 
effectively in his much-discussed account of an incident of virtual rape in 
LambdaMOO. 

I can see how it must be hard to understand how strong the emotional 
involvement in a virtual world can be. Picturing a person sitting in front of a 
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computer screen seems to signal distance and detachment, and in the terminology 
of virtual reality this situation would be characterized as entailing a low degree of 
immersion. Only a few of the senses are engaged, the level of stimuli or input is 
low, and outside stimuli have not been shut out. However, from an inside view, 
another type of immersion emerges. Consider the following extract from LeValley 
[ 8 ], 

I danced for my cyberspace husband, whom I had recently 
virtually eloped with, in-world. The dancing was a delightful and 
deeply moving experience. I danced with a silver teapot, with a 
chest, with my Asian female head and with my cyberhubby’s frog 
head (with outstretched tongue and fly) on the back of my left 
hand. I placed a fern on the floor of a temple room and I danced 
up out of it and back into it. I danced in the silence. I danced for a 
long time. I was fully engaged in the floating of the dance and in 
the act of dancing in beauty for him. 

The next morning, when I awoke in my primary referential 
context, I remembered the dancing, not only the image of the 
dancing but also the sensuality of the dancing. I had sensori-motor 
memory of the dance. I recalled the slight movement of the air on 
my face as I floated up and down, up and down. I remembered the 
fimny feeling in my tummy from this movement. I remembered 
the feeling of my arms outstretched with objects on my hand. I 
remembered the silence and the way time was suspended. I 
remembered both the solitariness of my self expression, in this 
dance, as well as my deep emotional connection to my 
cyberhusband. And I remembered all of this in my physical 
waking world body. 

One might think that she has tried some new incredible virtual reality system 
with astonishing performance, but her recollections describe an event from 
Worldsaway, a system fairly similar to the Palace (see Chapter 3). According to 
various models for measuring presence, such as [9, 10], the above system would 
score poorly. Still, the experience seemed so strong. Again we have something of a 
paradox on our hands, and again I believe the answer lies in the assumption that 
what Slater refers to as “the objective world” can be used as the standard for 
measurement. However, in this case, her emotional state of mind seems to be a 
much more important factor than, for instance, the realism of the environment. And 
which realism would that be anyway? Would the fact that her “cyberhubby” had a 
frog’s head matter, or would it be the degree to which the frog’s head looked like a 
living frog? 

So far 1 have argued that the interaction in virtual worlds is real interaction with 
real emotions and real consequences, but this does not make the worlds themselves 
real. I would, however, also like to argue that the environments and inanimate 
objects of a virtual world are as real as objects in the physical world - although 
different. Let us take one of those beers that Bill used to serve before his untimely 
demise as an example. Although we will not get less thirsty, and we will not get 
drunk no matter how many virtual beers we choose to guzzle, we all know that 
buying someone a beer means something more. It might serve as an invitation to a 
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conversation, or a sign of gratitude or even friendship. It doesn’t matter so much 
that we do not have to sacrifice any money to buy a beer; it is still precious to have 
someone engaging in the symbolic act of ordering a beer and handing it over to 
you in a virtual bar. The meaning of the act is also conveyed in a virtual bar. 

However, although the beer does not cost anything, it is interesting to note that 
the laws of inflation still work. If you order beers for everybody who enters Bill’s 
bar, its symbolic value will be deflated. But if instead you take the time to design a 
custom-made drink and offer this to someone, the gesture will be more potent than 
offering a generic drink. The system with props that can be shared and custom- 
made fills a function in the ongoing construction of social life. This activity can, of 
course, also be found in virtual worlds without props. Perhaps this behavior will be 
present wherever people meet. Nevertheless, the functions embedded in virtual 
worlds technology influences what people will do in the world. In Worldsaway, for 
instance, the existence of a monetary system has made trading an integral part of 
the social interaction of that system. 

So a beer in the Palace has different characteristics from a physical beer, but 
some of the symbolic significance is left intact. The symbolic significance is also 
very important in the use of physical objects. We frequently use these objects as 
equipment to try to convey a desired image of ourselves to the people around us. 
Goffman [5] refers to objects used in this manner as sign-equipment. In the 
following passage, different types of beverages send different messages. Note how 
the limited supply is a deciding factor in the effect of the sign-equipment, just as in 
the Palace. 


Thus, in the crofting community studied by the writer, hosts often 
marked the visit of a friend by offering him a shot of hard liquor, 
a glass of wine, some home-made brew or a cup of tea. The higher 
the rank or temporary ceremonial status of the visitor, the more 
likely he was to receive an offering near the liquor end of the 
continuum. Now one problem associated with this range of sign- 
equipment was that some crofters could not afford to keep a bottle 
of hard liquor, so that wine tended to be the most indulgent 
gesture they could employ ([5], p. 29). 


I began this section by comparing the surface of virtual worlds interaction to 
playing games. By looking beneath the surface, I found that what people do in 
these virtual worlds is really no different from what they do at work, at home, or in 
bars. They are not playing games; they are living their lives. 


4.6 Old Wine in New Betties? 

While some predictions about computer-mediated communication have been off 
course due to erroneous comparisons with the physical world, others have been 
wrong by thinking that social interaction on the Internet will be different from 
interaction in the physical world in ways they are not. One misconception about 
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virtual worlds, as well as about online life generally, is that everyone is equal on 
the net. My experience goes counter to this belief (see also Chapter 11). According 
to Goffman [5], we form social relationships by adopting consistent behaviors over 
time, and we take on social roles by enacting the rights and duties attached to a 
given social status. When we strive for higher social status, we also accept a 
stratification of the social structure and develop deference towards those who have 
reached higher levels. The foundation of the social system is consensus regarding 
values, and an assumption that the higher levels of status converge with that set of 
values. 

The concept of social status can thus also be used in virtual worlds, and it is as 
important in understanding a virtual society as it is in any ordinary society. What is 
new is that the criteria have shifted. The set of values is different. In physical life, 
things like money, work and how you look are important for how people will treat 
you. But those things are downplayed by the characteristics of virtual worlds. The 
old set of values is replaced by a new set. Instead of money, you need props; 
instead of a high status job, you need computer skills; and instead of looking good 
physically, you need to look good on the screen. This means that someone like Bart 
suddenly has a chance at becoming someone important. 

I don’t know very much about the real person behind the online identity of 
“Bart”, but my guess is that he found himself elevated to a new level of social 
status in the Palace and recognized the opportunities that this new world presented 
to him. If I am right about Bart’s experience, it is no longer so hard to imagine how 
he felt when the promise of becoming a wizard was first held out to him, only to be 
taken back later. 

Another example of the shift in the social hierarchy is what I experienced when I 
started my own world. At the time I was fairly new as a doctoral student in the 
department and had been an undergraduate not long ago, but when I occasionally 
arranged some recreational activities in my Palace I became the center of attention. 
I could decide the rules and those who did not obey got a dose of the wrath of God 
(me). Compared to an ordinary departmental seminar things were turned upside- 
down. A young doctoral student who would ordinarily keep a low profile on those 
occasions could feel right at home in this environment where he would tend to 
dominate the meeting, while the senior researchers who often do much of the 
talking in the seminars were not even present. 

The same phenomenon was apparent when I tried using the Palace in an 
undergraduate education setting. The distance between my students and myself 
shortened considerably in this scenario. I had made an office where they could 
come and get their assignments and report their results as well as a coffee room 
where they could hang out and chat. The coffee room worked especially well as a 
status leveler. They even told dirty jokes in my presence, which has never 
happened, before or since, when I have spent the coffee break together with my 
students in a physical coffee room. In short, the medium affected the discourse. 

Traditional status structures are broken down and redefined, for better and for 
worse. This can, in turn, lead to a conflict between those who have something to 
gain from trying to keep the traditional structures intact as against those who want 
a fresh start. It is important to remember that there are no absolute borders keeping 
the virtual world separate from the physical world. A person who is a lawyer in the 
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physical world but a newcomer to the virtual world is probably more prone to try 
to bring the conversation into the area of occupation in the physical world in the 
hope of transferring some status points, while a person who has attained higher 
status in the virtual world hierarchy but does not have very much going for him in 
the physical world might try to change the subject. 

Another misconception about participants in virtual worlds is that they are totally 
anonymous. Let us say that I would have really wanted to punish Bart by banning 
him from as much of the Palace universe as possible. As I have suggested earlier, I 
could have used the IP number of his computer. But since that computer was 
assigned a slightly different number each time he connected to the net, I would 
have had to ban not only all the people using that computer, but all the people 
using the same internet service providers as he did. Another way to ban him would 
be to put his nickname in a ban list. But since he can change his name to whatever 
he wishes at any time, we might conclude that this would be a poor attempt at 
keeping him out. However, there is a catch: to escape his sentence, he would have 
to give up his name, and by doing that he would also give up his identity. 

This brings us back to the need to fit into the social context. Like everyone else 
in the Palace, Bart had built up a personal community of people around him. He 
had invested time and effort in his relations with these people, and these 
investments resided in the connections to these people in the form of social capital 
(a discussion of personal relations, social capital, and virtual communities can be 
found in Agren [11]). 

Without his identity, he would also be without the key to all the resources he had 
created for himself within this community. In fact, Goffrnan [5] notes that you 
simply cannot belong to a society without stability of self-presentation. Likewise, 
Schiano and White [12] found that there exists a social pressure in virtual worlds to 
maintain a stable primary identity. So, as it turns out, I would not even have had to 
put his name on any ban list. I could have just as well let it be known that he had 
done something that disagreed with the value set of the society - and he would 
have had to see his social investments get flushed away. The lesson here is that we 
are not anonymous in virtual worlds. We are held responsible for our actions. All 
societies, virtual or physical, demand that we contribute something in order to 
benefit from being part of it, and, to keep tabs on our contributions, there have to 
be identifiers, and without an identifier, or an identity, there will be no payback. 


4.7 The Importance of Being There 

So this was the tragic story of Bill the bot. However insignificant my dear 
bartender bot’s existence might have been, at least he gave us the opportunity to 
catch a glimpse of the inner workings of social interaction in virtual worlds. I have 
argued that it is easy to make erroneous assumptions about life in virtual worlds if 
you are distanced from the phenomenon, and to try to make predictions from the 
outside. Perhaps one reason for this is that virtual worlds are metaphorically 
problematic. It seems to be hard to understand intuitively what a virtual world is 
and how it works, and easy to make unfair comparisons with the physical world. I 
have talked about virtual worlds as a type of medium but they are unlike any other 




Rest in Peace, Bill the Bot 


75 


media by allowing participants to enter them and interact in a non-physical 
location. Virtual worlds should not be thought of as a tool for a specific purpose 
such as work, education, play or entertainment. The term virtual worlds is a very 
apt description in the sense that they, just like the physical world, are “general 
purpose”. 

In this chapter I have tried to show that the qualities of social interaction in 
virtual worlds are fundamentally different from those of interaction in the physical 
world. Much of the interaction that occurs in virtual worlds has no counterpart in 
the physical world since it is shaped by the unique characteristics of the medium. 

I have also argued that interaction in virtual worlds is real. Watching someone 
engaged in interaction in a virtual world does look like someone playing a 
computer game, and we all seem to share some intuitive ideas about face-to-face 
interaction with other people as something important and fundamental for us 
humans - even if it is just chatting in the coffee room. But imagine a game that 
consists of the same form of interaction as in a coffee room and that is played with 
the same continuity. What is it then that makes the coffee room setting real and the 
game not? Without diving into the depths of ontology and linguistics, I think that 
the word “real” is very unsuitable for distinguishing this difference. 

Finally, I have argued that people participating in virtual worlds do establish 
persistent identities and form hierarchical social structures just like in communities 
in the physical world. The fact that it is hard to bring symbols of social status from 
the physical world into virtual worlds might lead to the assumption that social 
structures should be flat - but apparently we do not want it to be that way. We 
somehow always find ways to build social structures and ways to denote social 
status, and one prerequisite for this is accountability for our actions through 
identities. 

I would like to conclude with the general observation that people continue to 
behave like people - whether the world around them is virtual or physical. 
However, the technology that mediates our interaction has a great impact on the 
forms of interaction. This in turn implies that theories about human behavior can 
very well be used in this new context, but the physical world cannot be used as a 
yardstick for comparison. 
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Chapter 5 


30 Days in Active Worids: 
Community, Design and Terrorism in 
a Virtuai World 

Andrew Hudson-Smith 


5.1 Introduction 

The idea behind “30 Days in Active Worlds” was to document fully the 
development of a virtual environment from beginning to end, as a plot of virgin 
virtual land which, it was hoped, would develop into a community and a fully- 
fledged new virtual world. The aim was not to create a dialog of life in the virtual 
environment, such as the well-documented My Tiny Life by Julian Dibbell [1] or 
The Cybergypsies by Indra Sinha [2]. Yet the events that unfolded over the 30 day 
period led to just such a documentation, and with it my views not only about 
community and design in a virtual environment, but also about the increasingly 
blurred boundaries between what is real and what is virtual. The title “30 Days in 
Active Worlds” stems from the free trial software of the Active Worlds (AW) 
server, which allows users to host their own world. The trial software operates for 
30 days before timing out, enabling users to set up and run their own worlds and 
small communities before having to purchase a full server from AW. AW is a 
commercial multi-user system operating on a standard Windows-based system 
with a modem connection. Distributed and run from Newburyport, north of Boston, 
the AW Universe currently consists of over 700 worlds with an average of 400 
users logged in at any one time. Users, or citizens as they are known, appear as 
avatars. Avatars are the citizens’ graphical icons in the AW system, and the choice 
of avatars range from a large male biker called “Butch” to the petite female of 
“Tanya” with many incarnations in-between. 

My chosen avatar in the Universe was “Butch”, the alter ego of “Smithee” (my 
name in AW). My choice of “Butch” is not a reflection on my own real world 
appearance, heaven forbid! Nevertheless, it is the one that I normally adopted. I 
had trawled the then 423 worlds that made up the AW Universe for the previous 
six months as part of my research at University College London which was about 
placing three-dimensional models of urban environments on the Web. Jumping in 
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and out of these various virtual worlds I was struck by the diversity of 
environments that make up the AW Universe. Worlds range from the very social 
environment of “Friends” to the more desolate frontier atmosphere of “Mars”. 
Each of these worlds has its own characteristics and user base. Jumping in and out 
of these worlds allowed me to meet a number of regulars in the AW Universe, 
citizens that I would subsequently involve in the shaping of my own world. 

The research, under the banner of “Online Planning”, was aimed at opening up 
the possibilities of placing an urban planning system online within a shared virtual 
environment, allowing citizen participation, online democracy and other similar 
utopian ideals. It was this research that led me to download my own AW world, 
which I duly placed on an available Windows 95 machine in the comer of my 
office in London. 

The idea of using my downloaded world to allow users a free reign to build, say 
or do whatever they desired in the virtual environment came in the middle of a 
telephone call from a journalist called Tony Durham. Tony rang from the Times 
Higher Education Supplement asking what I was up to, having read about my 
research in a previous month’s Sunday Telegraph magazine. As a researcher I was 
obviously keen to get my work as much exposure as possible, and the fact that the 
Times Higher Education Supplement had mng up was a great opportunity. Thus 
was bom, in the middle of a conversation, the idea of opening up my server to the 
world and letting people build, say and basically do whatever they want, all in the 
name of “research”. Everything that took place in the world would be logged, 
allowing us to see what was built and when, and said by whom and at what time. 
Essentially, it would allow me to log the development of a virtual world and 
community from day one, in high detail, until its closure on day 30. Everything 
was set to go live on November 30^^ 1998, with the Times Higher Education 
Supplement mnning a small article on it the following week to announce its 
opening. 

The world was set up on my Windows machine, entitled the “Collaborative 
Virtual Design Studio” or “CVDS” for short. It was the first world to log both 
building and conversation in a virtual environment and tied in well with my 
previous research. On entering this new world, all users would be greeted with an 
infinite space of virgin green land which meets a blue horizon above a mountain 
range. I set about putting up welcoming signs and an entry space at Ground Zero in 
CVDS, so called as it is at 0 North, 0 West on the AW coordinate system. I had 
previously contacted AW and informed them of my research, and they were keen 
to help and extended my world out to 69N 69W (the normal free world only 
extends to 25N 25W). This gave me a world approximately the size of Soho in 
London in which I needed to place a range of objects with which the users could 
build. 

The main feature of AW is that users can claim land and build (for a description 
of how building takes place and the typical building patterns, see my essay with 
Schroeder and Huxor [3]). Building is carried out using a selection of predefined 
objects from windows, doors and walls to trees, shrubs and paving tiles. These 
objects can be cloned and placed on virtual land to create what is essentially a large 
virtual “Lego” set. To aid users in my world, a builder’s yard was set up, located 
69N, 69W, in which 368 objects were laid out, ready to be selected and cloned to 
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start off the building process. I was helped with the layout of the builder’s yard by 
two citizens I had met before in AW, Princess Tia and Dawny. Princess Tia and 
Dawny floated 50 meters above the builder’s yard and painstakingly cloned and 
placed each object in position, ready for the public opening. While they were in the 
virtual world I returned to the real world to set up a web page to introduce the 
project and hooked up our webcam to stream live pictures of myself and the office 
into Ground Zero for the opening. 

CVDS was, as far as I am aware, the first world to open up without any 
guidelines or laws. The aim was to see what people would build if they were 
allowed to build anything they wanted. Other worlds in the AW Universe have 
strict guidelines on what can by built, where and by whom. Systems can also be 
put in place to filter out certain words or phrases, resulting in ejection if guidelines 
are breached. Dodge and Kitchin [4] view AW as being more akin to a theme park 
with entry as long as you comply with the various restrictions. As a test of my 
world, and its open build philosophy, I left the server open to the world overnight, 
two days before the launch. Upon arriving at work and logging into the world I was 
greeted with two large signs, placed by an anonymous user. The first sign had an 
image linked in from a sex orientated website, and the second sign was text linking 
the image with my mother. With the world set to go live on a university server and 
the work being covered by the Times Higher Education Supplement I hoped this 
wasn’t a sign of things to come! 


5.2 First Steps 

The space that makes up the AW Universe is sparsely populated with an average of 
0.5 users per world. To get a world noticed and populated, there needs to be a 
‘‘hook”. There is no point in launching a world if no one comes to build, which 
unfortunately seems to be the case in most areas within the AW Universe. There is 
a saying on the web, that “if you build it they will come” (taken from the film Field 
of Dreams). In virtual worlds this is not necessarily true, especially if you haven’t 
built the world but want users to build it for you. I decided that a building 
competition was the way to go, and a prize of one year’s free citizenship to AW 
was on offer for the best design in the world after 30 days. The fact that the prize 
was a citizenship opened up the world for “tourists” to build. Tourists are users of 
AW that haven’t paid their $19.95 annual fee to become a citizen. This leaves them 
as something of 2nd class citizens, indeed many worlds ban them altogether, and 
even where they are allowed to build, their buildings are not guaranteed to remain 
intact. Actively encouraging tourists into my world allowed them to compete for 
citizenship status, safe in the knowledge that my world was logged, and therefore 
backed up every night in case of any crime or vandalism. It also aimed to achieve a 
level of integration, to change the social dynamic, which exists in other worlds, by 
giving tourists equal status. Equal status was achieved to a certain extent, in that I 
allowed both tourists and citizens equal rights in where to build, but some aspects 
of segregation were “hard coded” into the software. Tourists, for example, are 
limited to the choice of two avatars in the world, compared to the normal choice of 
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over 20. This instantly makes them recognizable. The normal “tourist” avatar is a 
male or female avatar with a camera round its neck, sunglasses on and a typical 
“tourist” appearance. To get around this I designed two avatars which would blend 
more into the overall environment. Losing the cameras around the neck and giving 
them more of a standard appearance, the tourists in CVDS were able to blend more 
into the environment. 

The launch of the project consisted of Princess Tia and myself standing around at 
Ground Zero, looking up at the webcam that was plugged in, having a streamed 
video image of myself, which was displayed at 50 feet above the central region. An 
advert was placed by AW in AlphaWorld, the most popular of the worlds in the 
Universe, and various messages were spammed across newsgroups. Princess Tia 
acted as a meeter and greeter, showing users around and letting them know that 
they could build whatever they wanted in the world. An automatic message also 
warned them that all conversation and building was logged. Dawny logged in from 
Pittsburgh, Pennsylvania, USA, around 2pm Greenwich Mean Time and decided to 
act as a tourist representative from the world. She went out around the other worlds 
actively recruiting users to come into CVDS and start building. This began to give 
the project the momentum it needed, and more importantly started the word of 
mouth that would eventually create the complex social world in the Windows 95 
box in the comer of my office. 

As part of the website, a daily news section was set up to document the major 
events in the world. This site also provided images of the buildings constmcted on 
each day and displayed a map of the world so far. On Day 1, 36 registered citizens 
and an unknown number of tourists placed a total of 6430 objects in the world. The 
number of tourists is unknown as they only count as “one” on the builders’ list that 
was used to map the world every 24 hours, but my impression was that there were 
twice the number of tourists as there were citizens, a ratio that increased as word 
got out that CVDS was offering the prize of a free citizenship. The amount of 
growth was surprising from the point of view that most worlds are sparsely 
populated; CVDS had overnight become the third most popular world in the AW 
Universe. It also provided interesting items on the news page and introduced users 
(or avatars) that would become regular faces over the course of the next 29 days. 


5.3 Virtual Terrorism 

The first few days of growth went well and the research was generating interest 
from various branches of the media. On Day 2, BBC Radio 3 dropped into Betty 
B’s house, located west of Ground Zero. Betty, logging in from Amsterdam, had 
built up a house based on the ideas of cubist design. Although the interior was still 
basic, it had a path with an American style mail box (linked to her hotmail account) 
and the beginnings of a well-planted garden. A technology journalist from the 
Press Association in the United Kingdom, Lawrence, known as Lorca in CVDS, 
also joined the world to write about the launch, and about how a new community 
was being built in cyberspace. With a deadline to meet, an interview was carried 
out in Ground Zero and he was introduced to Dawny, Princess Tia and Betty B. 
Despite a tight deadline Lorca stayed in the world during the day and even 




30 Days in Active Worlds 


81 


remained after work when the office closed down. This was an indication of things 
to come, as a number of us would spend almost every waking hour in the world - 
building, talking and generally exploring what was possible for the duration of the 
project. 

On Day 4, on logging in, I was greeted with 25 email messages and a stream of 
ICQs (a telegram message system) complaining that there was a user in the world 
knocking down buildings and placing thousands of objects. This (at that time 
unknown) user was running riot in the world, placing numerous objects and 
looking as if he intended to overload the world server. He did a good job since, 
when I arrived at work, the server came to a halt and the world went offline. By 
examining the log files, I noticed that a single user had been logging on and off 
during the night and, during a period of over 5 hours, had placed over 85,000 
objects in the world. The objects had been placed using automatic building 
software called “hambof’, which had been banned in some regions of the AW 
Universe, but not in mine as users could say and build whatever they wanted. My 
world was a world without laws, and if a user wanted to place 85,000 objects they 
were allowed to do so. However, because it had crashed the main server and thus 
essentially ended the 30 Days project, we decided that the objects that had been 
placed would be cleared up, and the community would be put to a vote to see if 
they wanted laws put into place to protect the world. By the fourth day, the 
community was therefore about to set up its own police department, complete with 
call boxes and regular patrols. 

The 85,0000 objects were mapped for the news page and then cleared so the 
world could go back online. Within a minute of the world going back online, users 
were coming in and surveying the damage. Houses built by tourists had been 
ripped apart and a trail of damage lay across the world, as if hit by an earthquake. 
A group gathered around Ground Zero and began to question what sort of person 
would do such a thing. Betty B mentioned that as the world was being logged, 
perhaps this was being done on purpose to see what the populations’ reaction 
would be. Suddenly I became suspect number one! Although this was a virtual 
world, I realized that it was still possible to feel uncomfortable. As the group of 
avatars gathered around me, accusations came thick and fast. I managed to talk the 
users around to seeing my point of view - that it would have been pointless to 
vandalize my own world. Nevertheless, the seed of doubt had been sown. On the 
plus side, the vandalism occurred when Lorca was in the world, so it made copy in 
the Press Association, TescoNet and Excite, which brought further interest in the 
project. 

Once restarted, the world remained on an even keel until Day 9 when a tourist 
going under the name of Jero logged into the world. Jero attracted the attention of 
Betty B, who was by then a core member of the community. Jero was asking 
questions about the vandalism incident from Day 4 and it became evident from the 
nature of Jero’s conversation that he had been involved in the incident. Betty B 
sent me an ICQ message and I logged into the world. When I met Jero, he divulged 
that he was the High Commander of the AW Terrorist Group, and although he had 
not undertaken the vandalism himself, he had issued orders for it to occur. Jero 
then issued a series of threats that he would hack into the main server hosting 30 
Days in AW and shut the world down. I sent a message to all the known users in 
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the world that the world was on “Def Con One” (a military alert) status with a new 
attack imminent. A group gathered at Ground Zero, looking towards the skyline 
and waiting to see what would happen. 

While Jero was logged on, a trace was carried out on his IP number (his internet 
address identifier) and AW.com was informed of the threat. This was mainly to 
ensure that the main world server couldn’t be hacked - as this would have 
jeopardized our local network at University College London, which I was not keen 
on. As a result of the IP trace, AW.com contacted the internet service provider 
(ISP) of Jero and obtained a contact number for the user’s account. The 
information was provided by the ISP due to a number of complaints that AW.com 
had had in relation to several reoccurring IP numbers. Apparently the AW Terrorist 
Group had been quite active in the preceding months. AW contacted Jero, or to be 
more precise, Jero’s father, in Vancouver, Canada. Legal action was threatened and 
Jero, a 15-year-old teenager logged in from his bedroom, had his computer taken 
away from him. The computer was not removed from Jero however before he 
could issue one final threat. Jero logged into CVDS for one last time on Day 11. 
He teleported to my location and issued the threat that I had upset the wrong 
person, and the group would be taking imminent action against me. Knowing he 
posed no serious threat, I ejected him from the world and thought that that would 
be the last I saw of him. Within half an hour I had lost all internet access from my 
personal machine at University College London. The world remained online, but I 
couldn’t even pick up my email. I remained unable to connect for over 6 hours 
while my internet connection was restored - a hacker had entered my machine via 
my own personal web server and disabled all my network card settings! 


5.4 Virtual Coffee: Community in 30 Days 

Community is central to the development of all virtual worlds, whether they are 
purely text-based systems such as LambdaMOO or three-dimensional virtual 
worlds like AW. A system will either thrive or decline according to the size and 
enthusiasm of its community - and 30 Days in AW was no exception. By the third 
day a group of 8-10 users were becoming regular builders in the world. 

Figure 5.1 shows a group of the users participating in “30 Days”. The photograph 
was captured on the final day of building, by which time a core community was 
firmly in place. The names on the image are difficult to decipher but the users, as 
in real life, can be identified from their appearance alone. Dawny adopted the 
“Tanya” avatar with long flowing red hair and a rather 1970s green dress. Betty B 
on the other hand chose the “Rachel” avatar with blonde hair tied in a ponytail. 
Lorca always logged in as “James” and Stick chose “Hotep”, walking around the 
world in his Egyptian outfit. The users had over 20 avatars to chose from and each 
person adopted a certain look, essentially recreating their own identity in the 
virtual world. In more populated worlds the restricted choice of avatars limits the 
ability to choose an avatar that represents one’s identity, but in “30 Days”, with its 
8-10 core users, we could each have our own look. Nevertheless, the limited range 
of avatars can be restricting compared to text-based virtual worlds. For example, in 
LambdaMOO users create their own identity through textual expression, allowing 
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each to create their own unique personality in the environment. “30 Days” allowed 
the avatars to become associated with each person and in a sense each avatar began 
to resemble a friendly face when entering the world, or walking in on a social 
gathering that was taking place. The exception to each person’s use of a fixed 
avatar was National Butch Day, organized by Stick. Stick organized the day via a 
series of emails and telegrams informing users that they were required to adopt the 
Butch Avatar for the duration of Day 20. Butch was selected because he was my 
chosen avatar identity in the world. I was blissfully unaware of the nature of the 
day until I logged in as usual and found a group of users around Ground Zero all 
looking like me with the Butch avatar! My initial confusion, not having 
recognizable faces to identify, was greeted with much amusement and provided the 
opportunity for numerous Smithee impersonations by the members of the 
community. 



Figure 5.1 Group of avatars at Ground Zero of “30 Days in AW”. 

Such incidents underline the sense of community that developed in “30 Days”. 
As with many descriptions of virtual worlds, the reader will often get the 
impression that one really had to be there to appreciate the sense of involvement. 
“Being there” essentially sums up the feeling of community involvement and 
excitement in the world. It was a feeling of being involved in something that had 
the potential to become part of the history of the development of virtual worlds and 
of community. A frontier attitude developed, one of shaping new worlds and 
seeing what we could do. Logging into the world started to feel like arriving home. 
I would leave my small flat in North London every day, catch the tube to Kings 







84 


The Social Life of Avatars 


Cross Station, arrive at work, and log in. As my avatar appeared at Ground Zero I 
would be greeted by upwards of 8 users already logged in. Each of these 
individuals would bid me good morning, afternoon or evening, according to their 
real world location, and we would embark on the daily ritual of looking at the new 
buildings and putting the world to rights over virtual coffee. 

This feeling of being home continued over the Christmas period. A Christmas 
tree was planted at Ground Zero, decorated with flashing lights and surrounded by 
presents. Shortly after my real world Christmas lunch I sneaked away and logged 
into my laptop that was set up at my parents’ house in the country. Surprisingly, I 
found a couple of users logged in and we decided to hold an impromptu carol 
service around the Christmas tree. We all gathered around and linked in some 
Midi-based music to the world, allowing us to hear the carols. First up was Oh 
Come All Ye Faithful, which we subsequently sung - or rather, typed. My family, 
upon finding me logged on and typing Christmas carols, were somewhat 
concerned. Why, rather than watching the traditional Queen’s Speech, was I logged 
into a virtual world with virtual carols and virtual presents? They may have had a 
point. 

The webcam, which had been streaming pictures both of me and of the machine 
that the world was running on, was constantly on, and after a while I tended not to 
notice that it was there at all. It was, however, noticed by the members of the world 
and, as soon as the second day, screengrabs from my webcam were appearing 
around the world. In one incident the live webcam was copied onto four sides of a 
cube and placed on top of a column looking over the world. Around the column 
were images of avatars with their hands raised to my image in praise. It gave the 
webcam an almost god-like appearance with the images of me looking out over the 
world that I had set up. Over the coming weeks the webcam became a focus for 
certain members of the community. One member, whom I will call Paul for 
anonymity’s sake, logged in on the twelfth day and asked me to come and look at 
the new home he had built. Such requests were normal as buildings were central to 
the nature of the world, but I wasn’t prepared for what would greet me when I 
walked into his house. 

Upon walking into his lounge, all I could see were pictures of myself pasted onto 
the walls. The pictures had been grabbed from the webcam and placed on an 
outside web server to hotlink back into the world. They were grabs of me which 
showed my normal office routine and as such, at least in my view, weren’t too 
interesting. A couple of them showed me with a cup of coffee, one with a 
chocolate bar and a few while I sat and ate my lunch. Each of these images had 
captions attached, such as “Smithee loves coffee” or, in the case of an image of my 
empty chair, “Where is our leader?”. When James asked for my views on his new 
house I was lost for words. The question that entered my mind was: why? It then 
dawned on me that while I was looking at the house and at the images of myself in 
the virtual world, I was also being watched live on the webcam. Paul asked why I 
looked shocked, to which I replied blaming an email I had been reading - and not 
the fact that he had images of my real world self all over his lounge wall. 

After finding the images in James’s lounge, I began to have the feeling of being 
watched. Thus I decided to take the webcam offline temporarily for a couple of 
days, with the blank screens on the webcam blamed on a technical error. While I 
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was out of the office, my colleague took a phone call from James, who asked him 
to reconnect the webcam without telling me so I could be watched when I came 
back into the office. My colleague refused and told me what had happened on my 
return. I suddenly realized that I had inadvertently opened myself up completely to 
the online world. My telephone number and location were on our main research 
website and images of me were beamed live into the world. Although the intention 
was to ground the research and give it a human face, I felt that a line had been 
crossed between the virtual world that I had set up and the real world in which I 
worked. 

I logged into the world and confronted James, who had by then informed other 
members of the world that he had phoned my office. I stated my reservations about 
his behavior and that in my view, virtual is virtual and real is real. The members of 
the community didn’t seem to share my concern, yet my view was that the webcam 
had become something that was distracting from the purpose of the world rather 
than aiding it. The webcam remains offline to this day, although images grabbed 
from it can still be seen in the world. 


5.5 Spatial Development 

The only sections of the world that I personally created in “30 Days” were Ground 
Zero and the Builder’s Yard. By the end of Day 30, 27699 objects made up the 
world, placed by 49 registered users and an unknown number of tourists. 

Figure 5.2 shows the final map of the world, with its buildings and infrastructure 
clearly visible. The world consisted of a number of houses, nightclubs, museums, 
bars, health centers and even a lover’s lane, complete with an adjacent motel. The 
majority of the structures mirrored reality, or rather a utopian view of reality. The 
world consists of a number of country cottages with long tree-lined paths leading 
to a rustic front door and into an open space with a roaring open fire. Wooden 
American-style lodges are also prevalent, standing side by side with skyscrapers 
made out of glass and floating castles. The placement of American-style log cabins 
harks back to the initial building when the AW Universe first opened. As we have 
previously pointed out [3], the early buildings in parts of Active Worlds tended to 
be like log cabins, owing more to the television series Little House of the Prairie 
than to an imagined “cyberspace”. “30 Days” seemed to be mirroring this frontier 
philosophy. 

The majority of these structures have doorways, windows, and flights of stairs or 
escalators. Yet in the virtual world there is no need for doorways or stairs as 
avatars can walk through walls and fly up to reach new floors. Indeed, the 
navigation system of AW makes the climbing of flights of stairs notoriously 
difficult with the avatar often getting stuck and being forced to fly. Stairs, doors, 
and chairs (avatars are unable to sit) are all part of the standard set of objects, and 
this has a direct influence on the structures built in the world. 

AW can be seen as a huge construction kit with a set number of objects. In 
addition to the nature of the objects, there is the widespread inclination of users to 
create structures that mirror the real world as much as possible - given the 
constraints of the system. An example of this is the Dark Night Bar. The bar has a 
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gents’ toilet, complete with urinals, washbasins and a mirror. None of the objects 
are there for obvious or functional reasons, but they add to the level of immersion 
in the world. Similarly, there is the grid-pattern road layout that criss-crosses the 
world. The road network developed between the sixth and tenth day, aimed at 
influencing the development of the world and extending its development along the 
newly placed highways. This was a highly labor-intensive exercise, especially 
when one considers that there are no cars or vehicles in AW. Part of the reason for 
the construction was that people knew that the world was mapped every 24 hours, 
with a new image placed on the website’s news page. The prospect of seeing one’s 
creations mapped led to a spate of incidents known as satellite writing (see Figure 
5.2). 



Figure 5.2 Map of the final days’ building in “30 Days in AW”. 

Satellite writing is text that, although it is indistinguishable from the ground, 
appears when the world is mapped from above, in the manner that the Nazca Lines 
in the Peruvian Desert are only discernible from the air. Mapping the world each 
night was like taking a satellite view of the world as it developed, and thus 
revealed structures that could not be seen from the ground. The first words to 
appear were “Hi” on Day 6. By Day 7, the words “By Cyberhar” had been added. 
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Cyberhar was also the architect of the CyberHar Castle and an alien made out of 
colored glass in the northern reaches of the world. 

All of these structures have been lone creations. Lovers’ Lane was the only 
creation in the world that engaged the whole community. Lovers’ Lane was set up 
by Dawny and her real life partner Ken as a romantic area of the world where love 
poems, and more importantly photographs of partners, could be posted. With input 
from the community, this transformed into an area where pictures of persons 
generally (not just romantic partners) were displayed, and it provided a focal point 
for members of the “30 Days” community to find out about each other. The photos 
even extended to images of people’s pets, family and friends, essentially giving the 
world a human face. 

Lovers’ Lane was a section of the world built by the community for the 
community, whereas Stick was working on his own building project for the benefit 
of an external community. Stick’s Community Church was built specifically for the 
purpose of prayer in “30 Days”. The Church resembles a Victorian-style English 
Christian church, complete with bell tower, stained glass windows and a church 
organ. Although built in “30 Days”, it wasn’t aimed at the internal community. 
Instead, it was built for the youth section of the Alpha Church, based in Brompton, 
London, and specifically for the Sunday School section. The church’s name 
“Alpha” and its closeness to AlphaWorld was purely coincidental, but allowed it to 
fit seamlessly in the AW Universe. It was actively used for meetings, and 
represents the only structure in the world that served a purpose outside of the 
virtual environment. It is worth noting that although it may seem that a church in 
“30 Days” could not portray any of the features of a real world church service, it 
was a sign that the community was coming of age. Virtual churches are a factor in 
many virtual world communities and, as a service, they tend to appear when the 
community has matured. Schroeder, Heather and Lee [5], in a study of virtual 
religion, note that a prayer meeting in a virtual world may not provide the same 
type of religious experience as a conventional church service, but it certainly 
reproduces some of the essential features of the latter - albeit in novel ways. While 
I wasn’t present at any of the services held in “30 Days” it was somehow 
reassuring that some sections of the world were used for such purposes. 


5.6 “30 Days II” and Beyond 

From the outset the fate of the world after the “30 Days” was clearly stated on the 
web page: it was in the path of an incoming asteroid and would be destroyed. This 
linked back to events in the first Active World, AlphaWorld, which was similarly 
wiped out in a cataclysmic event. In reality of course, all that would happen would 
be that the server next to my desk would be reset and the world utilized for other 
research. However, towards the end of “30 Days”, members of the community 
made it clear that they wanted the world and its community left intact. Due to its 
success in gaining media attention, AW.com granted a free one year's licence for 
the world which enabled the world to be kept running. This extension of the world 
was marked by the launch of 30 Days II, in which it was planned that - rather than 
using the existing object set - users could build their own objects and import them 
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into the world. A website was created and linked to various 3D software packages 
and information on making objects for AW, a task that was by no means easy. A 
new prize was placed on offer, a CD-Rom version of AW (value $50), which 
allowed high-resolution textures, and the world would continue to be mapped and 
logged. 

Only four objects were submitted: a Condom Machine (to be placed by Lovers’ 
Lane), a Moon, a Park Bench and a Carousel. The low number of objects illustrates 
the difficulty in making custom objects. The majority of users continued to build 
with the standard object set from the Builder’s Yard. Midway through 30 Days II, I 
started a new contract at work, and although I still mapped the world, I was not 
able to log in for eight hours a day as I had done during the previous 30 Days. The 
number of users of the world began to decline and eventually the world was only 
populated by 2-3 users at its peak. The world therefore became another empty 
world in the AW Universe and began to resemble a ghost town, a feeling that is 
present in many areas of AW. The “ghost town” effect may be seen as a symptom 
of the frontier philosophy in AW. Population levels in newly-created worlds tend 
to be higher as people are attracted by the ability to take part in a new project. 
Once a world has grown and land has been claimed people often move on in search 
of the next virgin plot to build on, leaving behind them a virtual ghost town. “30 
Days” saw rapid growth, initially in the physical structure of the world and then in 
its community. Once the users had built their houses, nightclubs, or whatever took 
their fancy, the world increasingly became used for socializing and the rate of 
building declined. The world’s “hook” was that it would be logged and mapped for 
30 Days with a prize at the end, and despite the community's initial intentions to 
keep things running, the members moved elsewhere. Some of them moved onto 
other multi-user systems, such as the Everquest role-playing game by Verant 
Interactive, Inc. Stick continued to use the Church for a year before his 
membership in AW expired. Lorca caught up on all the work he had missed while 
being drawn into the experiment. Dawny and Ken decided to get married in the 
world, which was set to be the first “30 Days” reunion with Stick acting as the 
virtual vicar. Unfortunately Dawny and Ken split up and the wedding never took 
place. Dawny recently got back in contact to let me know that Ken had died of a 
heart attack and asked to re-enter CVDS so she could see what he had built in 
Lovers Lane. It now stands as a memorial to his work in the world. Other users set 
up their own worlds in the AW Universe and the central region of CVDS was 
cleared so that other research could take place on the server. I reflected on my 
300+ hours spent in the world and decided to go back out into the real world for a 
while. 

So what did “30 Days” achieve? It was the first fully mapped and logged world 
and the maps have appeared in a range of publications documenting the history of 
virtual worlds. Perhaps more importantly it allowed a community to develop - and 
for a short time to thrive - all from a standard personal computer in the comer of 
my office. 
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Chapter 6 

Lessons Learned: Building and 
Deploying Shared Virtual 
Environments 

Lili Cheng, Shelly Farnham, and Linda Stone 


6.1 Introduction 


6.1.1 Overview 

The design and structure of virtual environments have an impact on the nature of 
the social interactions found within the environment. Our goal in this chapter is to 
understand better why some designs and structures support sustainable, dynamic 
social interactions, while others do not. Over the past six years, the Virtual Worlds 
Group at Microsoft Research [1] has designed, deployed, and studied two virtual 
environment products, Microsoft V-Chat [2] and the Virtual Worlds Platform [3]. 
With both of these products, we designed tools for others to build and deploy their 
own environments, and we built and supported environments of our own. Over the 
years, hundreds of world builders have used these products to build virtual 
environments for their own communities. Thousands of end users have visited, 
joined, and helped to develop communities. Through our prototyping, 
observations, and data collection we have learned a number of lessons about the 
design process of community builders and the impact of design on the social 
dynamics of virtual environments. The following chapter describes Microsoft V- 
Chat and the Virtual Worlds Platform, our research process, and the central lessons 
we have learned. 
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6.1.2. Key Questions and Background 

Our goal was to explore how the design and structure of virtual environments 
affected social interactions. More specifically, we hoped to understand better how 
to build virtual environments to foster sustainable, dynamic communities. 

When we began designing virtual environments in 1995, we drew heavily on 
existing work, particularly text-based Multi-user Domains (MUDs) and 3D multi¬ 
player games. Having existed for almost twenty years, MUDs provided a rich set 
of information about building and maintaining online social environments [4, 5]. 
MUDs had a number of qualities that we expected would be key for building 
sustainable, dynamic communities [5]. Unlike other early systems, text-based 
MUDs supported multi-user real time interactions. The people, places, objects, and 
their interactions were persistent. End users authored and contributed to the 
evolution of the dynamic environment. 

We were also interested in exploring how different interaction contexts, 
particularly graphical environments, would affect user interactions. We 
hypothesized that graphics would allow users to be more expressive, and would 
appeal to a wider audience than the text-based systems. In particular, because 
graphical environments allow for non-verbal communication, we expected that 
they would foster more engaging social interactions. 

We reviewed many graphical world projects, in particular ID software’s 
DOOM/Quake 3D games [6], Habitat/Worlds A way [7], and New York 
University’s interactive 3D television show YORB [8]. We found that these 
projects adopted different design approaches. For example, DOOM/Quake 
provided an immersive 3D environment with artistic control and highly structured 
user interaction. The 2D Habitat/Worlds Away projects focused on social 
interaction and the effects of an economic system in the closed community. The 3D 
YORB project was an accessible system that let any Manhattan Cable subscriber 
explore the world using their touch-tone phones, and add pictures, sounds, and text 
to the world. We wanted to study how different design approaches affected the 
development of a sustainable community. As a result, rather than develop an 
application that supported a particular design approach, we developed tools that 
supported multiple design solutions for building virtual environments. 


6.2 The World Building Applications 

We developed two applications that were geared toward allowing users to build 
their own virtual environments: Microsoft V-Chat, and the Virtual Worlds 
Platform. We used both applications to design and deploy our own virtual 
environments, in addition to studying how other world builders used the 
applications. 
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6.2.1 Microsoft V-Chat 

V-Chat is a graphical chat environment that integrates graphical representations of 
the user and the social space with standard text chat (see Figure 6.1). We built V- 
Chat “on top of’ the existing Microsoft Network (MSN) version 1.0 [9] text-chat 
client and incorporated features like MSN user names, profile information and user 
“levels” (hosts, spectators, etc.). 

We also built tools for others to create V-Chat spaces. Environments could be 
text only (default text client), 2D, or 3D. World builders could use predesigned 
templates or create their own custom designs. They could link objects in the 3D 
environment to content on existing web pages, and they could control the artistic 
design of the space by disabling custom avatars. 

The MSN product team helped us identify world builders, and we worked with 
them to design and build the initial community-specific V-Chat spaces. We 
launched V-Chat in December 1995 on MSN vl.O, with a variety of general chat 
spaces and specific community spaces. 

Several years later, V-Chat 2.0 was released on the Internet with MSN. The 
second version was built on top of Internet Relay Chat (IRC), a protocol used by 
many other text chat services and products. The V-Chat port to IRC services 
provided V-Chat end users with additional features, including a directory service 
for finding rooms, integration with text chat, and tools to create their own chat 
rooms. V-Chat ran for six years, and stopped running in March 2001. 



Figure 6.1 Image of V-Chat. 


6.2.2 Virtual Worlds Platform 

Based on feedback from V-Chat users and world builders, in 1996 we began to 
design and develop a new prototype, the Virtual Worlds Platform. V-Chat world 
builders wanted the software to support more interesting social interactions than 
just text chat. They wanted to support different scenarios ranging from 
entertainment to learning to social support to business. They wanted a custom user 
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interface that met the design and feature requirements of their internet-based 
projects. 

The Virtual Worlds Platform provides a toolset that lets world builders create 
custom, web-based virtual environments on the Internet. We released the Virtual 
Worlds Platform on the Internet in 1998, and provided the platform (including 
tools and sample code) for free for non-commercial use. We additionally released 
the source code in 1999. 



Figure 6.2 Zora and HutchWorld. 

We are currently working with content developers who are building sites using 
the Virtual Worlds Platform (see Figure 6.2 for two examples of virtual 
environment projects that were built and deployed using the Virtual Worlds 
Platform). The Zora project [10, 11, 12] by Marina U. Bers at the MIT Media Lab 
is an example of a virtual environment that supports learning and storytelling. The 
HutchWorld project [13] by the Fred Hutchinson Cancer Research Center [14] and 
Microsoft Research is an example of a virtual environment that provides cancer 
patients with access to their social support networks. We continue to learn from 
both world builders and from the observations of end users to further our 
development of software to support the design and development of virtual 
environments. 


6.3 Research Process 

Over the past five years we have collected extensive information on the design and 
use of virtual environments using a number of methods, ranging from informal 
observations to formal experimental studies. For example, after releasing V-Chat in 
1996, we had two sociologists from UCLA, Peter Kollock [15] and Marc Smith 
[16] gather and analyze V-Chat usage data. We analyzed usage of V-Chat over 
time by repeating the study in 1998, and then again in 1999. We incorporate these 
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different methods into an iterative design process. We build, deploy, and evaluate 
prototypes, and then repeat the process throughout the development cycle. 


6.3.1 Iterative Design 

In some respects, the life of the environment begins when the space is deployed. 
Many new world builders spend most of their resources designing and building the 
environment, particularly focusing on the design of the 3D world. With little user 
feedback and no experience deploying a virtual environment, world builders often 
neglect to consider the needs of the community in the design. 

In addition, many world builders provide no mechanism for evaluating the usage 
of the virtual environment, other than to “wait and watch”. It is often difficult to 
determine what needs to be fixed until the world is deployed. However, in many 
cases, by the time the world is deployed, the programmers and 3D designers are no 
longer on the team, and it is too late and too expensive to make changes. We 
therefore adopted an iterative design process, deployed early in the development 
cycle, which also provides a continuous means for evaluating usage. A virtual 
environment requires ongoing activity, modifications, design, and maintenance. 
Designing an adaptable environment that can be easily managed and modified 
based on usage habits and user feedback is key. An iterative design process gives 
world builders a better understanding of the design problem, so they can better 
allocate the resources required to build, deploy, and host a virtual environment. 


6.3.2 Evaluating Usage 

Over the course of developing and deploying virtual environments, we have 
evaluated the usage of the environments over time through a range of methods. We 
have: a) data collected by other marketing and research groups from an 
independent source; and we ourselves have b) informally observed world builders 
and end users in the graphical environments; c) conducted interviews and focus 
groups; d) conducted online surveys; e) collected and analyzed quantitative usage 
data; and f) conducted experimental studies. 


6.4 Lessons Learned 

Since we started six years ago in 1995, world builders and end users have become 
significantly more experienced building and deploying virtual environments. 
Nevertheless, six years ago we would have incorrectly predicted that graphical 
synchronous virtual environments, particularly 3D environments, would be used 
extensively for a broad variety of scenarios, ranging from entertainment, to 
learning, to business. Some of the lessons that follow describe not only what makes 
a virtual environment work, but also why some did not work. 
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The central lessons we learned for fostering sustainable, dynamic communities 
fall into the following general areas: 


Individuals: 

1. Provide persistent identity to encourage responsible behavior, individual 
accountability, and the development of lasting relationships. 

2. Support custom profile information that addresses the privacy concerns of 
individuals. 

3. Encourage individuals to invest in their self-representation by supporting 
custom end user graphical representations. 

Social Dynamics: 

4. Support the ability for groups to form and then self-regulate. 

5. Frequent and repeated interactions promote cooperative behavior. Help 
people coordinate finding and meeting those they care about to increase 
the likelihood of positive interactions. 

6. Make community spaces more compelling by supporting the development 
of reputation and status. 

Context, Environments and User Interface: 

7. End users and world builders preferred 3D, non-abstract environments 
with a third person point of view. 

8. Graphical environments provide valuable context for non-verbal 
communication. 

9. Different communities have different needs and require different user 
interfaces. 

We will now go into each of these lessons in more detail, presenting first an 
overview of the issues involved in each case, and then charting the course of how 
the design evolved. 


6.5 Lessons Learned: The Individual 

We found that individuals invest more in their online representation and they 
interact more responsibly with one another when they have a persistent identity and 
a rich representation. In addition, users feel more comfortable when their privacy 
concerns have been addressed. 
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6.5.1 Lesson 1: Persistent Identity 

Provide persistent identity to encourage responsible behavior, individual 
accountability, and the development of lasting relationships. 

Overview: Without persistent user identity (ID), people find it difficult to develop 
lasting relationships. Providing persistent IDs for users allows them to be uniquely 
identified across sessions and thus able to relocate each other and develop lasting 
impressions of each other. Without persistent identity, people feel less accountable 
for their behavior, leading to more “bad behavior” such as identity “theft” or 
spoofing. Persistent user identity allows users to invest in their online reputation, 
encouraging them to be accountable in their interactions with others and to act 
more “responsibly”. 

Design Evolution: The initial V-Chat environment, vl.O, supported persistent 
identities and a relatively closed community of users. V-Chat vl.O provided 
persistent user IDs by integrating into the Microsoft Network and using MSN IDs. 
When V-Chat was released, MSN members paid a monthly fee for their 
subscription to the MSN service. Each user had a unique MSN user name and 
password. V-Chat users appeared in V-Chat with their unique MSN name, so it 
was easy for individuals to identify one another by name. The core group of V- 
Chat users met over and over in the space, and it was easy for them to identify each 
other and develop “good” or “bad” reputations within the space. 

The value of the “secure” MSN community and the persistent user identity 
became clear to the users when it was taken away in V-Chat v2.0, which was 
accessible to anyone on the Internet. Any user with access to the Internet could 
access the V-Chat 2.0 rooms and enter with any user name. The old users 
complained that many of the new users were “not behaving responsibly.” People 
were being spoofed, and there was no way to know who was who. 

For example, members of the Angel Society club complained of “spoofing”. The 
Angel Society was a V-Chat club founded by end users that supported new users 
and helped manage the various V-Chat rooms (the Angel Society is described in 
more detail in Lesson 4). The membership in this club was by invitation only, and 
members had high status within the V-Chat community. To show membership the 
Angels adopted angel avatar costumes. Spoofers faked membership to the Angel 
Society club by adopting angel names, e.g., “Angelxxx”, and by wearing angel 
graphics. The members of the Angel Society found these spoofers quite offensive, 
and wanted them to be punished. Rather than punish users, we attempted to reduce 
the ease of spoofing in the space. We prevented users from logging in at the same 
time with the same name and we put the user’s network IP address in the public 
profile. This made it more difficult for users to pretend they were someone else. In 
addition, we made the rules of conduct visible during login, and we banned users 
that did not follow these rules. In sum, we found it necessary to include system 
features in the environments that let us uniquely identify and ban badly behaving 
users, and to clearly display the public policy for dealing with bad behavior. 

With V-Chat 2.0, even in the absence of technical solutions, we found that 
regular users developed a persistent identity in the environment, and used their 
identity to make friends and build relationships. Most regular users had a regular 
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name (or set of names) and graphic (or set of graphics) and they often had regular 
habits like showing up in the same place around the same time. They also often 
knew the habits of their friends. All of the members in our focus groups knew their 
online friends by their user names and their avatars. They tended to meet each 
other in a particular environment at a particular time of day. 


6.5.2 Lesson 2: Custom Profiles 


Support custom profile information, including features that address privacy 
concerns. 

Overview: Without a standard way to access and share the profile information of 
others, users find it difficult to get a sense of “who else” belongs to the community. 
With no mechanism to share relevant information, users must repeat the same 
general information over and over or post the information in a custom location. 

We provided a variety of solutions for recording and displaying profile 
information. In some cases we used standard profile forms, in others we provided 
community-specific custom profile forms. In addition, in some systems we let 
users reveal different levels of personal information for different audiences. For 
example, users could give their friends more access to personal information than 
strangers. 

Design Evolution: We expected that user profiles would allow users to provide 
standard information (including name, age, location, sex, etc.) that would spark 
conversation without having to repeat the same information over and over in text 
chat. However, we found that people did not always provide profile information. 
When V-Chat was on MSN (a more secure community), more users tended to fill 
in the optional profile information with “real” information than when V-Chat was 
released on the Internet. Nevertheless, in both cases we found that often people did 
not fill in the profile information or filled it in incorrectly. For example, a common 

response to the “Sex:_” item was “none” or “yes”, rather than “male” or 

“female”. Often this item was left blank or filled in with false information. 

When V-Chat 2.0 was released on the Internet, in addition to the standard profile 
dialog, users were prompted to write a description about themselves. If they left 
their personal description blank, they were described by “This user has nothing to 
say”. We found the open-ended text description area was filled in by more users 
and was more interesting to others than the standard profile fields. 

In all cases, we found that some of the most common questions in the text chat 
conversation were questions that could have been answered in the profile. These 
included questions like, “Where are you from?”, “How old are you?”, or “ASL?” 
(age, sex, location). Often people would look at the profile information and then 
ask “Are you really from {xxx} place?” or “Are you really {xxx} age?”. We 
believe that users did not provide profile information because it was too much 
trouble to look at the profile, because it was “too many clicks away” in the user 
interface, the information was too public, and they preferred to ask the questions in 
real time as a “conversation starter”. 
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Like in the real world, it was common for V-Chat users to disclose different 
amounts of information based on their personal relationships and their level of 
expertise. V-Chat users disclosed more private information over time. This was not 
facilitated by any specific privacy-related profile features in the software, but 
occurred via chatting in the application, and by exchanging email, instant 
messaging IDs, photos, phone numbers and face-to-face meetings. 

We further supported different levels of self-disclosure in HutchWorld, a virtual 
world that provided social support for cancer patients and their caregivers at the 
Fred Hutchinson Cancer Research Center (the Hutch). HutchWorld was built using 
the Virtual Worlds Platform and was deployed at the Hutch in the summer of 2000. 
In HutchWorld, users could specify profile items as “private”, “friends only”, and 
“public”, and users could access other members’ profiles regardless of whether the 
users were online or offline. We found that people in the private community 
actively used the profile information. User logs showed that people not only 
provided extensive, accurate profile information, but also read each other’s profiles 
extensively. 

In sum, users appeared to prefer profiles with open-ended personal statements to 
structured text fields. Furthermore they were more comfortable providing profile 
information in a closed community and when they could control the level of access 
to their profile information. 


6.5.3 Lesson 3: Self-representation 


Encourage users to invest in their self-representation by supporting custom end 
user graphical representations. 

Overview: Initially, we could not predict whether or not people would create and 
share avatars in a graphical chat environment, and whether or not they would use 
their graphic to represent some aspect of their identity. We were delighted and 
surprised by the popularity of the custom avatar graphics, and the impact of the 
graphical avatar on conversation and status in the community. Users invested in 
their own personal representations; they enjoyed sharing their own drawings and 
images, and they enjoyed selecting and creating their graphical representations. 
Most people used the graphic to convey something about their true identity, 
particularly gender. Others used the information to help identify interesting people 
to talk to and people to avoid. 

Design Evolution: In V-Chat we provided a variety of solutions for adding 
graphics to the user’s representation. A specific 2D “sprite” avatar format was hard 
coded in the design of the application (“sprite” has become a common term for 2D 
user representations in 3D environments). V-Chat provided a standard set of 20 
avatar graphics plus the ability for end users to design custom avatar graphics. The 
graphics included 20 frames, allowing the avatar to animate different gestures. 
These avatars represented a variety of styles. 

During the development of V-Chat avatars, we argued about which style of 
graphics would be used most (abstract, photographic, human, animal, etc.). Some 
argued that abstract images would be used most, as they were more creative. 
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Others preferred more silly, cartoon-like avatars. Still others wanted sexy avatars 
or lifelike/photographic avatars. We decided to release V-Chat with a set of 20 
avatars. These avatars included a variety of image types - and we watched to see 
what type of avatar graphics people would actually use. 

With V-Chat, we agreed that end users would want to create their own custom 
graphics, but we argued about how much time and effort the end user would spend 
creating pictures. Creating a custom avatar using the V-Chat avatar wizard was a 
lot of work. It required that the user download a separate application and create up 
to twenty separate images for their graphical avatar. In addition, users needed to 
upload their avatar and specify who could use the image by marking it “share, let 
others use” or “private, for my use only”. We wanted to encourage end users to 
create custom avatar graphics, but we thought that in reality this process might be 
too difficult and tedious for users. 

We developed an understanding for how people used avatars through: a) an 
avatar contest; b) informal observations of the usage of graphical avatars; and c) a 
study conducted by two sociologists from UCLA, Peter Kollock [15] and Marc 
Smith [16]. 

In the spring of 1996, about 6 months after the release of V-Chat, we held an 
avatar graphic contest. One goal of the contest was for us to determine how users 
represented themselves (human, animal, male, abstract, etc.). For our avatar 
contest, any user could submit an avatar design. A panel of judges determined the 
best designs in each of the following categories: human-male, human-female, 
animal, objects, abstract, and child. Winners received t-shirts and buttons, and their 
names were posted in the avatar contest section of the site. Of approximately 75 
submissions, over half were realistic (men (33) and women (15)). The rest were 
equally divided between a variety of objects, and animals. Although no one 
submitted a photographic avatar, in general, men submitted male figures and 
women submitted female figures. 

Around the same time, shortly after V-Chat was released in 1996, Kollock and 
Smith [17] evaluated the “popularity” of avatar graphics by observing the usage of 
different types of avatars (general, room specific, end user custom) in each 
different room, and recording the frequency of use by analysis of V-Chat log files. 
They repeated this study in 1998 [18]. 

Kollock’s and Smith’s analysis matched our anecdotal evidence and the avatar 
contest data. In 1996, for avatars used in V-Chat (including both the default “set” 
of avatars and end user custom graphics), the most common was a male, then 
female, and then a variety of animals, objects and other designs. By 1998, humans 
remained the most common avatar, but rather than males, young women become 
the dominant theme for the most popular custom avatars, with many cartoon and 
science fiction characters present [18]. Like in the avatar contest, photographs of 
the self as an avatar were not popular, but users did represent themselves with 
photos of famous or attractive, sexy people. The least used designs were the purely 
abstract designs. 

Kollock and Smith found that custom avatars were one of the most popular 
features of V-Chat. In the 1996 study, they found that 21% of all avatars used in V- 
Chat were custom avatars, created by end users. When the study was repeated in 
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1998, they found that 87% of the avatars in use were custom avatars. This 
represented a very substantial increase in the number of custom avatars [18]. 


1 M # 4 
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Figure 6.3 General V-Chat avatar designs. 


They found that different users invested different amounts of time in the 
community. The people that used custom avatars visited V-Chat on average for 
longer amounts of time. In total, 37% of all time spent in any avatar was spent by 
users wearing a custom avatar [18]. From our informal observations and through 
an online survey [19], we found that users used the avatars to represent an aspect of 
their identity (e.g., gender) and their intentions in the space (e.g., to flirt) but they 
did not tend to pick representations that were too speciflc (photos of themselves) or 
too abstract. 

As with the profile information, the use of these different graphical images 
encouraged social interaction. They helped users identify those they wanted to 
interact with and those they wanted to avoid. Like text-based profile information, 
the graphics were a popular topic of conversation. Users often asked an unfamiliar 
user with a custom avatar “Did you make your custom?” or “Nice custom 
[avatar]”. In addition, people would often avoid people with a custom avatar that 
they found unappealing. 


6.6 Lessons Learned: Social Dynamics 

Individuals came together to form groups in the V-Chat and Virtual World 
environments. We found that supporting the formation of groups and clubs, 
supporting the development of reputation and status, and providing a mechanism 
for groups to self-regulate - all encouraged the development of community in the 
virtual environment. 
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6.6.1. Lesson 4: Forming Groups 


Support the ability for people to form into groups and then self-regulate. 

Overview: When world builders take on the responsibility for managing social 
dynamics, they can interfere with community members taking on a more active 
leadership role in their own community. World builders can host events and create 
clubs, but this is costly, does not scale well, and often does not respond to the 
needs of the community. Providing the ability for people to form into groups and 
then self-regulate by moderating the space and managing bad behavior is key for 
developing a self-sufficient community. 

Design Evolution: Early on in the deployment of V-Chat, we recognized the need 
for “setting a tone”, managing the space, and scheduling and hosting events, but we 
did not have the staffing or the experience to engineer social interactions for the 
emerging community. 

We did provide a part-time forum manager for the rooms, but it was 
unreasonable to expect her to be able to manage all of the social interactions and to 
be responsible for promoting the development of community in the 20 or so V- 
Chat spaces. In addition, we were not sure if it was desirable for us to monitor the 
rooms. Often when a host or forum manager entered the room, we observed users 
immediately leaving the room. The forum manager was not part of the end user 
community and was often viewed as an interfering authority figure, an outsider. 

At the same time we also observed usability problems for new users. New users 
did not know how to move or interact with others in the graphical environment. 
Several experienced users wanted to be able to help the new users, and offered to 
form a club to help others in the V-Chat community. 

We thought a helper club would be useful, but we did not want to create status 
differences by favoring a particular set of users. We consulted with sociologist 
Peter Kollock to design a system to support the development of user-generated 
clubs. Kollock suggested that we provide a clearly defined and fair system, such as 
a mechanism for anyone to create a club. He also suggested that we not give clubs 
special privileges (such as the ability to kick or ban users from the space) unless 
absolutely required. 

We followed his advice and designed a system to support the creation of clubs 
and groups. On the main V-Chat page, we had a club application form. Any users 
who wanted to form a club needed to submit the user names of 10 people that 
supported the club. A description of the club, and a specified club owner were also 
required. Once the club was created, we offered to display a graphic promoting the 
club in the 3D V-Chat lobby. Compass. 

V-Chat users told us that many newbies in the space were having difficultly 
communicating with others and navigating in the 3D spaces. They suggested 
forming a group of volunteers to help newbies. The helper community that 
emerged called itself the “Angel Society.” The Angel Society was not granted any 
special privileges, and we did not interfere with the policies and rules they 
designed for themselves. The Angel Society members created their guidelines and 
recruited members. Members had the self-assigned privilege of wearing a custom 
angel avatar graphic. The Angel Society members wore the same custom angel 
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avatar graphic and later developed a naming convention such as “Angel xxx”. The 
Angels organized themselves and created a schedule for helping new users. We 
were not sure if this would be enough to create a club and, contrary to our 
expectations, being a member of the Angel Society became a V-Chat status 
symbol. In fact, the Angel Society remained with the V-Chat product longer than 
any of the members of our development team who worked on the project! 

We did not support user-generated rooms in V-Chat vl.O. We found that people, 
especially advanced users, wanted to self-regulate their group. They wanted to 
create their own custom spaces to attract their own interest groups, and they 
wanted to manage, design, and maintain the space themselves. 

In V-Chat v2.0, we supported user-created rooms. Any user could design, name, 
describe, and assign hosts in a custom room. Like many other Internet Relay Chat 
(IRC) systems, to create a room, users typed in the room name and a one-line 
description of the room. In addition, we let users select a room graphic. The room 
and the number of people currently in the room were displayed in a directory. 

In the user-created rooms, the owner/creator of the room self-regulated the space 
by setting up membership rules and specifying a particular user as a host. Hosts 
could kick out and ban users from the room and could assign others with host 
privileges. Hosts were typically key members of the group. 

To restrict membership in the room, creators scripted automated bots that would 
look through user profiles and automatically “kick” users that were not a valid 
member of their community (filtered by age or other profile fields). Through these 
means, the owner could regulate a custom room. 

The users’ ability to create and manage their own rooms made a significant 
difference in usage of the application. When end users were allowed to create 
rooms dynamically, the number of rooms created by end users greatly exceeded the 
number of rooms that were provided by the V-Chat team. In the directory, there 
were typically about 100 V-Chat environments up and running and of these, about 
80% were end-user created. Although many of these rooms had only a few 
participants, many social groups formed around the various user-created rooms. 


6.6.2 Lesson 5: Cooperative Behavior 


Frequent and repeated interactions promote cooperative behavior. Help people 
coordinate finding and meeting those they care about to increase the likelihood of 
positive interactions. 

Overview: Frequent and repeated interactions between users encourage users to act 
responsibly. Users are more likely to cooperate if they believe that they will see 
others again. Being unable to coordinate finding similar people or meeting friends 
in a virtual environment is a barrier to entry. Often it takes too much effort to find 
interesting people and the areas of social activity. Providing mechanisms to let 
users find friends, find active groups of people with similar interests, and create 
user-generated interest groups is key for encouraging cooperative behavior. 

Design Evolution: In V-Chat version 1.0, through survey data [19], we found that 
users wanted to meet their friends online. Many frequent users coordinated meeting 
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friends online by using external tools like instant messaging and buddy lists, in 
particular the product ICQ [20]. People who used instant messaging to meet each 
other in V-Chat spaces reported more frequent and higher quality interactions. (We 
should mention here that, at this time, ICQ was the only buddy list product 
available on the Internet). 

V-Chat version 1.0 users also looked for groups of people with similar interests 
and intentions, but it often took time to find a group, and many users never 
invested enough time to find people with similar interests. 

To address this issue, we watched the V-Chat community to see when and where 
different groups of people “congregated”. Our general space, named Compass, was 
the most popular space, and members of different user groups, particularly the teen 
and adult users, would often disturb general chat users and new users (called 
newbies). Rather than formally separating the groups or setting up complex social 
rules, we supported these groups by describing the room audience in the directory. 
In general we found that teens preferred to socialize with other teens and the adult 
audience preferred to socialize with other adults. V-Chat became more pleasant for 
all of the different audiences when people were able to separate into their 
respective groups. 

The adult content users were the most advanced users. They included not only 
those interested in “adult” behavior, but also those interested in interacting with a 
mature audience. Some world builders classified their rooms as “adult” to 
discourage young users from entering (at that time on MSN only users 18 and over 
could enter rooms marked “adult”). In the adult rooms, the users were generally 
older and also more experienced. Users tended to have more private conversations 
(via whispering in text chat) and use more custom avatars. The adult rooms were 
particularly popular late in the evening. 

The teen community tended to come online in the late afternoon, after school. 
This group primarily consisted of teenage boys, who particularly enjoyed 
interacting in the 3D space. They ran around the room, played tag, and zoomed 
close up to other avatars. They were also very active chatters. The teens’ group was 
particularly attracted to a room we called “Lunar Islands”, a sci-fi looking 
landscape. 

The newbies, mentioned earlier, were often new to text chat, and had little 
experience using 3D graphics. The teen and adult users were particularly 
intimidating to the new users. Initially we created a separate environment for 
newbies, but this space was almost never used. 

In V-Chat version 2.0 we integrated a directory service, typical of an IRC chat 
application. This provided the name, a text description and the number of people 
currently in the space, making it easier for users to find those with similar interests. 

In summary, letting users find either people they care about or similar others was 
key to increasing the likelihood of positive interactions and discouraging bad 
behavior in the virtual environment. 
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6.6.3 Lesson 6: Reputation and Status 


Make community spaces more compelling by supporting the development of 
reputation and status. 

Overview: We found that the development of reputation and status was an 
important aspect of a compelling social experience for our V-Chat users. Users 
developed reputation and status in V-Chat using various mechanisms. The primary 
items people used to build status were: a) hours online (more advanced skills); b) 
friends online (having popular friends); c) formal club membership (for example 
the Angel Society); d) artistic talent (displayed via avatar creation); and e) 
exploration of the environment (discovering secret places in the 3D spaces). 

Design Evolution: Most of the items to which V-Chat users attached status were 
not intentionally designed as a means toward building reputation in the community. 
However, as the status items emerged in the V-Chat community, and as we 
developed an understanding of their importance in the process of community 
development, we encouraged the behavior by supporting and emphasizing those 
features. 

We found that users were passionate about the status-related features. To some 
extent these status features drove the usage of V-Chat and the social dynamics in 
the space. Users were so passionate about custom avatars that they formed sites 
and communities around avatar exchange. They bragged about finding secret 
places and shared these secrets with their friends. They complained about members 
of different exclusive groups and enjoyed the status of being popular in the space 
and of spending upwards of 8 hours a day in the virtual environment. 

In particular, we found that users greatly enjoyed creating, sharing, and selecting 
avatar graphics. Wearing a custom avatar graphic or creating avatar graphics 
became status symbols and conversation starters. In V-Chat, when a user posted a 
custom avatar, it was stamped with the name of the creator. These avatar graphics 
were exchanged and worn as status symbols. Several users became famous for 
creating well-used custom avatars. In fact, the use of custom avatars was so 
popular that some users began to post custom avatar websites for others to access 
and to exchange custom graphics. 

Another sign of one’s status was an awareness of the Easter eggs - secret items 
or places to explore and visit - that the V-Chat designers hid in the V-Chat spaces. 
One popular V-Chat room was the Fishbowl. In Fishbowl, there was a small hole 
in the background collision detection plane. Users could escape out of the hole and 
go behind the background, to reveal a secret stage set. In another general use space, 
a large-scale desk we named Desktop, we put a piece of bubble gum under the 
table and the artist put his own face on the coins that were on the table. In our main 
lobby. Compass, we hid a triangle high above the center of the room. Inside of the 
triangle was a bright red room. In our sci-fi landscape. Lunar Islands, we hid words 
behind the two moons. If you went out to the moons and looked back at the world, 
the world would seem to disappear. 

We thought we would provide a treat to the users in the focus group by revealing 
the secrets of each room. They surprised us by telling us “Oh, we know all of 
those”. Then they surprised us by revealing other Easter eggs that our designers 
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had hidden in the world, like initials hidden on the backs of textures, and other 
secret objects and places. Every single secret and hidden Easter egg we put in the 
space delighted our advanced users. They shared this knowledge with one another 
and gained status through this exploration of the space. 


6.7 Lessons Learned: Environments and the User 
Interface 

We found that the design of an environment and its user interface affected the 
nature of the environment’s social interactions. In addition, we found that different 
communities have different needs, both in terms of the environment and of the user 
interface. 


6.7.1 Lesson 7: Graphical Environments 

End users and world builders preferred 3D, non-abstract environments, and they 
preferred seeing themselves in context in the space when interacting with others. 

Overview: Our V-Chat design team had many heated debates regarding 2D versus 
3D environments, abstract versus literal design styles, and first person (looking 
through my eyes) versus third person point of view (I see myself in context). In V- 
Chat, we provided a variety of 2D and 3D environments with different points of 
view. Within these environments, we also varied from abstract to literal design 
styles. We found that the V-Chat users and world builders preferred the 3D designs 
to the 2D designs, they preferred non-abstract environments, and they found the 
third person, over-the-shoulder point of view easier to use than the first person 
point of view. 

Design Evolution: 2D and 3D environments have different advantages and 
disadvantages. The 2D environments are typically easier to navigate and create, 
and users can see themselves in the context of other users. The 3D spaces are 
typically more immersive, but are more difficult to create and navigate. 

In general, we found that the V-Chat users preferred to use the 3D V-Chat rooms 
to the 2D rooms. In a focus group, one woman said that she did not like a 2-D 
room because it was “all flat”. Users did not necessarily understand the difference 
between 2D and 3D spaces, but they said they liked “standing in across from other 
users and moving around”. Our world builders also preferred the 3D rooms to the 
2D rooms. Although our team helped create two 2D rooms for third parties, all of 
the other environments (about 30) posted on MSN in the first year were 3D spaces. 

We also varied the extent to which spaces were more abstract or more literal. We 
found that the users’ preferences for environment design matched their preferences 
for avatar design. As with the avatars, we found that the abstract room designs 
were the least popular. The least used room was BugWorld, an abstract 2D design 
(see Figure 6.4). The abstract 3D rooms were also rarely used. However, users also 
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did not ask for lifelike, photographic spaces. They liked Tabletop and Fishbowl, 
which were built off recognizable metaphors but were not replicas of real places. 



Figure 6.4 BugWorld V-Chat. 

Finally, we allowed users to choose either a first person or a third person view in 
the 3D spaces. Using a first person point of view, the user views the world as 
though looking through their eyes. Using the third person view, users view the 
world as if looking over their own shoulder. We found that while the first person 
view was more immersive, users would often not realize that they were visible in 
the environment because they could not see themselves. In addition, it was difficult 
for users to position themselves so that everyone in the group could see one 
another. 

In summary, the 3D non-abstract environments were preferred over 2D 
environments in V-Chat. Nevertheless, despite advances in hardware and 
networking, 3D graphical environments have not become largely popular except in 
the context of games and entertainment. Further research studying the uses of 3D 
in different contexts could help explain when the use of 3D is most beneficial. 


6.7.2 Lesson 8 


Graphical environments provide valuable context for non-verbal communication 
and gesture. 

Overview: Graphical environments allow people to communicate non-verbally. 
Users can express emotions through gestures, and communicate interest and 
direction of attention through their orientation and position in the 3D space. 
However, the use of graphical features to communicate non-verbally often 
interferes with verbal, text communication. 
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Design Evolution: As with the avatar graphics, before releasing V-Chat we 
discussed the different gestures and the different ways of “talking”. We were not 
sure which gestures would be used (wave, smile, flirt, sad, silly, etc.) and we were 
unsure how often the different ways of speaking (say, think and emote) would be 
used. 

V-Chat users could interact with one another in the 3D space by typing or 
explicitly controlling their graphical avatars. In some cases, the gestures were 
automatically triggered via the content of the text conversation (for example, by 
typing “hello” the avatar would wave) and in other cases pushing buttons in the 
user interface would trigger gestures. 

We found that the most popular gestures were positive gestures: waves (23%), 
smiles (17%), and flirts (11%). The least used gestures were negative gestures: 
angry (8%), shrugs (6%), and sad (5%) [17]. Over time, the use of positive gestures 
continued to be the most common [18]. 

Gestures were used less than expected for a couple of reasons. The user interface 
required users to focus on either the interaction in the 3D space or on the text chat 
history window. In addition, in order to make the avatar gesture, users needed to 
take their hands off the keyboard and use the mouse to push discrete buttons in the 
user interface. 

As a result, many began their sessions by exploring the 3D space, but then as 
they engaged in text chat, they would minimize the size of the 3D window and not 
move in the 3D space while focusing on the conversation. It also seemed that 
people would forget to gesture when engaged in conversation, or rather use 
conventional chat emoticons to express non-verbal information. 

Not only could people communicate non-verbally through gestures, they could 
also communicate non-verbally through their position in the 3D space. Casual 
observation suggests that users were quite aware of non-verbal communication in 
the spatial context. Phrases such as “someone coming over to me”, “someone in 
my face”, “someone walking through me”, “come here”, and “look at me”, were 
common. We also observed small clusters of 2 or 3 people positioning themselves 
off in the distance in what looked like private conversations. 

We expected that end users and world builders would prefer the 3D rooms to the 
2D rooms because in part they allowed for richer non-verbal communication. To 
study explicitly non-verbal communication in the 3D space, for several weeks in 
1999 we collected position and conversation data of users in several of the public 
V-Chat spaces [20]. We found that people did use the 3D space for non-verbal 
communication. They used their ability to position in the 3D environment to stand 
near and to look at the person with whom they were talking. Furthermore, because 
they were able to communicate the direction of their attention non-verbally, they 
were less likely to address their chat messages with user names than if they 
communicated only through text chat. 


6.7.3 Lesson 9: Custom User Interface 


Different communities have different needs and require different user interfaces. 
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Overview: Different communities have different needs. Often based on the size, 
location and goals of the community, different features and user interfaces were 
required to support the virtual environment. In the Virtual Worlds Platform, to 
support better world builders, we supported active scripting and DHTML (a 
software standard), letting world builders use familiar tools to customize the user 
interface of their virtual environment. 

Design Process: In V-Chat we did not allow for custom user interface design of 
the V-Chat application. Initially, we imagined that users would primarily be 
interested in meeting others and joining a general chat room with twenty or more 
users. Our world builders had different goals. We learned that people interested in 
“general spaces” had very different requirements than people working with specific 
communities. In the general spaces, finding others with similar interests was an 
issue. In a small, specific community, building critical mass was a much more 
important issue. 

World builders wanted to “control” the artistic look and feel of the space, and the 
design of the user interface. Many had small communities, and the synchronous 
text chat form of communication did not work well for small communities of users 
because often the space was empty, and people arrived in the space at different 
times. Our profile was geared toward finding others currently in the space, and 
provided only general information about a user. It did not provide useful 
information to communities of users that already knew one another. In addition, 
different communities had different access policies. The “open to everyone” policy 
we established for our general V-Chat rooms did not work for private 
communities. 

To address these concerns, we let V-Chat world builders prohibit custom avatar 
graphics in their space and we let them create custom 2D and 3D graphical avatars 
and spaces. However this level of customizability was not enough. V-Chat world 
builders were not able to customize the user interface, form of communication, or 
user representation, and this limited the types of groups that were able to build 
online communities in V-Chat. 

Many wanted to add asynchronous communication features like bulletin boards 
and email. Many wanted to be able to support different layouts of the controls: 
some wanted a small 3D view window or no 3D view window at all. Many 
requested a custom user representation, including custom profiles and gestures 
relevant to the community. 

We initially developed the Virtual Worlds Platform to allow scripting of objects 
in the virtual environment. Early on our world builders requested custom user 
interfaces. To address this demand, we provided a variety of sample user interfaces 
and we let world builders customize different aspects of their environments. This 
became one of the most popular features of the Virtual World Platform. Of the 
twenty or so initial V-Worlds virtual environments, all of them customized the user 
interface, and almost all changed the design of the default avatar to incorporate 
custom profile information and custom interactions. For example, in a Virtual 
Trading Pit for commodity traders [21] the world builders simulated the experience 
of being on an actual trading floor. Here, just as in an actual trading pit, end users 
could wear custom jackets and badges representing their companies, and they 
traded virtual financial instruments using custom gestures like buy and sell. In 
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HutchWorld [13], a support environment for cancer patients and their caregivers, 
key profile information relevant to the community, such as the user’s role (patient, 
caregiver. Hutch staff, etc.) and their bone marrow transplant date, was added. 



Figure 6.5 Custom user interface for Virtual Trading Pit, by Acknowledge Systems. 

To make it easier for Virtual World Platform world builders to customize 
profiles, avatars, and user interfaces, we provided example templates and the 
source code for these templates. We also let world builders completely customize 
the client user interface using standard web tools, and for those interested in 
making deep changes, we released the source code. 


6.8 Summary 

The support of multi-user, real-time interactions, persistent places and people, and 
the ability for end users to author and contribute to dynamic, expressive 
interactions has proven valuable for developing sustainable virtual environments. 
The key lessons we discussed focused on how these elements were used to support 
the social dynamics that emerge in the online community. 

Individuals: Support persistent identity, custom profiles and privacy, and graphical 
user representation. 

Social Dynamics: Support the ability of groups to form and then self-regulate, the 
ability of individuals to have frequent interactions with others, and the ability of 
individuals to build social status and reputation. 

Context and Environment: Support 3D designs, graphical non-verbal 
communication, and the differing needs of different types of communities. 
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We should note that our initial belief that adding 3D graphics would appeal to a 
wider audience has not proven to be the case. In V-Chat, we found that the 3D 
graphics features were used by a subset of text-chat users rather than by a wider 
audience of text-chat users. Despite advances in technology over the past six years, 
multi-user 3D environments still have difficulty achieving critical mass, 
particularly in scenarios for practical applications (as opposed to those for 
socializing). We believe that for many communities, the demands on the users’ 
attention and the networking and machine requirements will continue to be a 
barrier to building critical mass. In addition, for many world builders, the 
additional cost of building, supporting and maintaining graphical environments 
prevents them from integrating graphics, particularly 3D graphics, in their online 
communities. Nevertheless, the graphics have contributed positively to the social 
dynamics of our graphical virtual environments. We also believe that the success 
of multi-user graphical games and the social interaction that occurs in these spaces 
will continue to influence the expectations of end users and the design of software 
in educational, business, and social support communities. 
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Chapter 7 

The Long-term Uses of Shared Virtual 
Environments: An Exploratory Study 

Alexander Nilsson, Ilona Heldal, Ann-Sofie Axelsson 
and Ralph Schroeder 


7.1 Introduction and Background 

In this chapter we are interested in the long-term uses of shared virtual 
environments (VEs). We begin with a review of research that is relevant to this 
topic. Next, we describe an exploratory trial to examine long-term uses whereby 
four participants - the authors of this chapter - took part in ten one-hour meetings 
in a desktop VE, Active Worlds (AW). The trial allowed us to examine the 
problems and changes that took place over the course of these meetings. The aim 
was to explore both technical and social problems. We found that the main 
technical problems were related to voice communication. We also found that 
technical issues and issues to do with the design of the VE, which have been the 
focus of much research on VEs, were much less important than specific social 
issues, including awareness of each other and how meetings were organized. We 
found, moreover, that there was a process of “adaptation” to the constraints of the 
system, and to how we collaborated in the VE. We conclude by describing some 
implications of these findings for future research into long-term uses of shared 
VEs. 

There has been much research on virtual reality (VR) systems and applications, 
but almost no study of long-term uses. Perhaps this is not surprising since most of 
the uses of VR to date have been short, typically up to half an hour at most per 
session. There are very few settings in which VR has been used for longer periods, 
though we shall discuss the exceptions below. Many of the envisioned future uses 
of VR, however, include spending longer periods of time in VEs. If, therefore, VR 
systems and VEs are to be used apart from short periods, we need to know more 
about the implications of longer-term usability. 

To examine the issues in long-term uses, we designed a simple study (to be 
described further below) which would allow us to obtain some preliminary insights 
into the advantages and disadvantages of long-term usage of VEs. To do this, we 
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chose a multi-user setting with four participants. There will, of course, be many 
long-term uses of VR systems in single-user mode - or, for shared VEs, with two 
users. The reason for our choice is that we envision multi-user uses for small 
groups as a popular long-term application of VR (tele-meetings, training, 
collaborative work at a distance, and the like). 

We also chose the format of ten one-hour weekly meetings. In a sense, 
construing “long-term” in this way is arbitrary, but we found that this length of 
time was suitable for collaborative sessions, and a long enough period to 
experience noticeable changes over time. We also emphasized a scenario that is 
likely to be realistic - an unstructured meeting. 


7.2 Previous Research 

The development of VR has brought many visions of what the future long-term 
user might face. Yet there are almost no actual studies of the implications of 
regular, repeated, or long periods of usage. Here we shall describe work that has 
been done so far in relation to long-term effects and see that the issue explored in 
this chapter - the social implications of long-term usage and adaptation to 
interacting with others in VEs - is missing. 

Past research has resulted in a long list of possible factors in relation to the long¬ 
term usage of VR, but these have not been systematically studied. These include 
nausea [1, 2], human factors and ergonomics [3, 4, 5, 6], importing virtual behavior 
into real-life situations [7], and addiction (frequently discussed in the popular 
literature on VR). In technologies related to VR, the following have been 
discussed: computer and internet “addiction” [8, 9], “cultivation” theory for 
television (users getting a distorted view of reality after much exposure) [10], 
computer-mediated communication or virtual networks replacing or supplementing 
real ones [11], and so on. 

Let us look briefly at some ways of classifying these issues. In relation to VR, 
Wilson [12] has categorized the effects as follows: 1) physical effects on vision 
and body; 2) disorientation and nausea; 3) behavioral change; 4) addiction and/or 
escape from reality; and 5) ethics and morality when creating worlds. In relation to 
1 and 2 the issues in VR will depend on which systems are likely to be used in the 
future and what their technical features are. 3 and 4 on the other hand - the larger- 
scale issues about the relation between media and reality - are unlikely to be 
resolved by VR if they have not been addressed in relation to well-established 
media. (There are still debates, for example, about whether watching video 
violenc e really causes aggression, or if electronic games are causing addiction.) 
Other issues (5) are of a type that is not amenable to social scientific investigation, 
even though they must be taken into consideration by researchers. 

We shall argue that a central topic that is left out by this list and by previous 
research is adaptation. This has so far been discussed in relation to VR only in 
limited ways. There have been debates, for example, about whether it is possible 
for users to “adapt” to VEs, for example by filling in missing or misleading 
information in the VE. Another example is when adaptation is discussed in 
connection with motion or simulation sickness, which has received a lot of 
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attention in VR research: recent work by Biocca et al. [13] states that sensorimotor 
conflicts either result in incorrect judgments, create simulator sickness, or let the 
user adapt to the new environment. This area is important because it is accepted 
among researchers that motion sickness can be reduced or even avoided by 
adaptation. This is true, for example, if the user quickly adjusts to sensory 
rearrangements, e.g., has high adaptability [14]. 

It is also well known, however, that adaptation can cause problems. Adapting to 
a VE creates after-effects when returning to the normal physical situation (see, for 
example, [15]). According to Stanney et al. [16], many of the after-effects, such as 
flashbacks, fatigue, malaise, motion sickness, eye strain and drowsiness result from 
adaptation to a VE. This problem has partly been tackled by Welch [17] who 
describes how adaptation and “un-adaptation” (in order to avoid after-effects) can 
be improved. However, these aspects are mainly related to HMDs (Head Mounted 
Displays) and they relate to sensory and physiological adaptation. As we shall see, 
they do not apply to our trial in this way, but it is nevertheless important to 
mention them because adaptation in some way is bound to apply to all long-term 
uses of VEs. Here we shall use the term “adaptation” in a more general sense, to 
indicate the way in which our behavior changes noticeably, or accommodates itself 
to, the VE setting. 

The only other trial that we are aware of in which VR has been used on a long¬ 
term basis is the COVEN (Collaboration in Virtual ENvironments) trial. This trial 
involved multi-user communication tasks in a desktop setting repeatedly over 
weekly sessions, more or less continuously over four years, in order to develop and 
evaluate distributed VE technology. Long-term use was not addressed specifically 
as an issue in the COVEN trial, but this issue arose because the trial consisted of 
regular multi-users sessions (for details see [18]). 

According to reports and interviews with key participants, the trial showed that, 
in an “applied” setting, neither simulation sickness nor ergonomic problems were 
of major concern. (The trial used only desktop systems.) Interestingly, the 
participants seemed to learn skillful ways of navigating and they adapted to the 
non-natural way of manipulating objects. They reported no difficulties in these 
areas. Despite this, they experienced the trials to be mentally tiresome and 
stressful. Problems of orientation in the VE continued, and the participants also did 
not seem to get used to inconsistencies in world view between the users. In order to 
overcome the difficulties in communicating with others, they added special 
functions (a button to form a group, graphical sound waves to indicate visually 
which avatar was speaking, and so on). 

The last issue points to one of many communication problems during the trials, 
and adds another topic to the ones mentioned above in the literature review. There 
has been very little research on how people experience each other when meeting or 
working together in VEs. Social implications are a central missing piece in the 
COVEN trial and in other research. So far, collaboration problems have mainly 
been dealt with technically, without consideration for interpersonal dynamics. 

Types of shared VEs which have been used for longer periods are internet-based 
VEs for socializing, which are discussed in several chapters in this volume. 
However, again, issues around long-term use have not been studied for these 
systems, though their usability has been examined (see Chapter 6). Another 
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difference to bear in mind is that the main purpose of these systems is socializing, 
whereas the main purposes of shared VEs envisaged in this chapter are 
collaborative practical ones. And finally, we used AW with audio, which is not 
used in the online system. 

As we can see from our review of the related literature and previous trials, most 
of the research has focused on the physical effects of the visual aspects of VEs and 
on the body (or the sensory system), and the difficulties or problems have mainly 
been dealt with technically. We envision one of the most promising application 
areas - especially for long-term usage of VEs - to be settings were people meet 
and do things together. In long-term uses of shared VEs, or of computer-mediated 
communication (CMC) more generally, the most interesting issue may not be 
human-machine relationships, but human-human relationships. 


7.3 Method and Study Design 

The aim of this study was to remedy some of the gaps just mentioned; that is, to 
analyze small group interaction in graphical VEs with a focus on how these aspects 
change over time. To identify these aspects, we set up a number of meetings in the 
online VE Active Worlds (AW) (see http://www.activeworlds.com), where we, the 
four authors of this chapter, got together for ten one-hour meetings, with one 
meeting per week, to carry out various activities together. Apart from the VE 
system, which is a graphical system with text communication, we also used a 
complementary audio system for voice communication. 

After each meeting we filled out a pre-prepared questionnaire. The questions 
concerned the following aspects: (1) navigation, orientation and manipulation; (2) 
communication; (3) collaboration; (4) adaptation; (5) inconsistencies; and (6) 
presence and copresence. Most of the questions consisted of a quantitative part 
which let us evaluate the different aspects on 7-point scales (e.g., where 1 was 
extremely bad and 7 was extremely good) as well as a qualitative part which 
consisted of open-ended questions to capture the details of the experiences. 

After each session, after having filled out the questionnaires, one of us (we took 
turns in doing this) collected the questionnaires and summarized the results. We 
should mention that although the questions were partly quantitative, we will not 
present these results here because they will not have achieved validity, especially 
in a small group using self-reports. Instead, we used the information from the 
scales in our questionnaires to give us easily comparable indications about where 
our answers differed and how they changed over time. 

It should also be mentioned that we occasionally and irregularly had real- life 
meetings between the virtual ones, to debrief about the virtual meetings and to plan 
for coming ones. These meetings, which were of various lengths and content, took 
place more often towards the beginning rather than the end of the study. 

As to the system, we used the 2.2 version of the internet-based AW system. In 
choosing AW, we were aware of the fact that many would not regard this as an 
appropriate means to assess VR and VEs since AW is “desktop” rather than 
“immersive” VR. But if we define VR as a computer generated environment in 
which the user feels present and with which s/he can interact (see Chapter 1), then 
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AW can be considered VR. A number of studies indicate that AW users behave as 
if they are actually present in the environment and also interact (e.g., they build 
and explore) a great deal with the environment. 

Another point to make here is that there are several advantages in studying the 
AW system as compared to other systems. The first is that this is one of the few 
widely-used VEs that has regular and long-term users. Axelsson and Schroeder 
[19] interviewed ten long-term users of AW, where long-term was defined as 
spending more than two hours per day for at least two years in this VE. Such users 
are by no means uncommon, as we also know from an interview with the system 
developers who carried out a survey of users. 

Second, whereas many multi-user systems are under development, this system 
has been stable for many years. This point deserves to be stressed: trials in VEs are 
typically carried out on systems that have not been used regularly and therefore 
either have technical problems or they have not been extensively adapted to user 
needs. The AW system has had a number of features - both technical and “social” 
- added to it over the years, both by the developers and by users. 

The third point (already mentioned) is that it is possible for the user to modify the 
VE - to build - within the VE. This will be of considerable importance to long¬ 
term uses of VEs since VEs which the user cannot modify him/herself are likely to 
have uses that are highly constraining to the user (to appreciate this point, imagine 
a “real world” setting in which you spend a long time and in which you are unable 
to modify the environment). Anderson et al. [20] found that, when they allowed 
user’s input into the modification of the shared VE meeting space over a series of 
meetings, they were able to improve the space considerably to allow for more 
effective interaction. 

Apart from using AW as it is normally used by its user population, as a graphical 
system with text communication, we also used a complementary audio system for 
voice communication (VoiceCREATOR, the recently discontinued audio system 
from www.hearme.com). This allowed us to speak one at a time, when pressing 
down a keyboard button. The common use of AW is with text communication 
(“chat”), although there are other internet-based systems that are similar to AW 
that allow voice communication such as Onlive Traveler (see www.ccon.org and 
Chapter 2) - but they do not allow building in the VE by participants. Being able to 
modify the VE and use sound made this system similar to DIVE, which was used 
in the long-term trial mentioned above (COVEN). 

This was an exploratory and time-consuming trial, and thus we decided to 
undertake the trial with ourselves as the objects of study. The disadvantage with 
this procedure (shared by the COVEN trial) is that we are, of course, not 
“unbiased” subjects, subjects that are randomly selected or selected as “typical 
users”. The advantage is that we are able to comment upon our experiences in 
terms relevant to the study, especially as we are all researchers in the area of social 
aspects of VR. 

We should mention another point that deserves to be stressed because it is 
unusual in studies of this type: all four participants knew each other very well 
before the trial - we have all been working in the same department for more than a 
year and know each other both as colleagues and friends. This is uncommon for 
studies of collaborative uses of computers, which are typically based on 
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participants working together for the first time. (COVEN participants also knew 
each other, but to varying degrees and not as day-to-day colleagues). A further 
disadvantage of our study is therefore that it is based on four participants who have 
worked in the research area and who use self-reports as the basis for the results. 

On the other hand this provided the advantage of being able to pursue new 
interesting questions as they arose during the trial. However, the main reason for 
choosing to study four participants who knew each other well is that we wanted to 
avoid the need to become familiar with each other in “real” life as a factor, which 
could work against effective interaction and collaboration. Put differently: many 
researchers have made the informal observation that it is difficult to engage with 
“strangers” in VEs. This is not the case here in the double sense that we knew each 
other and had a common background. The aspect that the participants should know 
each other is realistic for many envisioned long-term settings. 

An important difference between the four participants was that two of us were 
familiar with the VE, having spent more than 100 hours in AW. The other two had 
spent a total of less than 2 hours in the VE previously, and we shall see that this is 
an important difference. To sum up, our study design and method does not fall into 
existing categories: it is not experimental, nor participant observation, nor is it a 
protocol-based evaluation of a new system. Instead, we borrowed from all of these 
approaches and deliberately set out in this eclectic manner in order to avoid the 
limitations of other methods. 


7.4 Sessions and Tasks 

The idea of the set-up of a series of unstructured meetings was to get as close as 
possible to a real-world situation. We did not want the development of the 
meetings to get locked into a fixed laboratory setting, but instead chose to let the 
experiment take its own course. For example, if it was better to have meetings in 
the real world before the session in a VE, then we did so. Or again, if we found a 
task too complicated or too boring, then we changed to do something else. 

We completed ten one-hour sessions in the VE together, one every week during 
consecutive weeks. Table 7.1 briefly describes what happened during the course of 
these sessions. The tasks can be classified into four different types: (1) Planning 
and decision-making; (2) Teaching and learning; (3) Collaborative building; and 
(4) Joint exploration. As we had not fixed on carrying out any particular activities 
or tasks, what we did during the sessions varied from one session to the next. 

It can be seen that our time was spent in a rather unplanned way, mainly building 
and exploring. Overall, however, we became more at ease with interacting with 
each other in this VE over time, and in using the system for different purposes 
(building, discussions, teaching, etc.). 

We can mention here that the number of tasks we performed - which may be 
misleading way to put it since much of our time was spent socializing, deliberating 
and exploring - may not seem very large for ten hours. It could equally be said, 
however, that we accomplished a lot for such a short time, and it is perhaps in the 
nature of long-term “unstructured” trials that the evaluation of “task performance” 
is bound to be subject to different interpretations. 
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Table 7.1 Sessions and Tasks 



Tasks 

Task types 

1 

In the first session, we became acquainted to interacting 
with each other in the VE, informal discussion about 
how to use the system, and first lessons in how to 
create and manipulate objects (only one of us was 
competent at this skill). 

Planning and decision¬ 
making; Teaching 
and learning 

2 

Session 2 was spent learning how to build and 
informally discussing what we should build. 

Teaching and learning; 
Planning and decision 
-making 

3 

We planned what to do next, and did a puzzle-solving 
task (a rubik’s cube-type puzzle which we had studied 
in another VRWE setting), with two of us doing the 
puzzle and two onlookers. 

Planning and decision¬ 
making; 

Collaborative building 

4 

We split up to get ideas and to look for objects to build 
with after deciding to build a water-slide together. We 
also spent time on building instruction and building. 

Joint exploration; 
Teaching and learning; 
Collaborative building 

5 

We continued in this increasingly organized way, 
choosing a leader, and dividing our new task (building 
a house together) between two groups who carried out 
different building activities. 

Planning and decision 
-making; 

Collaborative building 

6 

Session 6 continued the building process, adding 
entrance, exits and other features to our house. 

Collaborative building 

7 

During session 7 we continued building, but also went 
exploring in worlds in the system to find special 
features that we wanted to add to our house. 

Collaborative building; 
Joint exploration 

8 

Session 8 was spent in building, entertaining a stray 
visitor to our new house, and ended with some fun: 
running over an obstacle course consisting of stairs. 

Collaborative building 

9 

We explored: we went together to three different AW 
worlds, finding out about them and trying to get a 
common understanding of these different places. 

Joint exploration 

10 

The tenth and final session was a slide presentation and 
discussion of our sessions which took place inside our 
house. 

Teaching and learning 
Planning and decision 
-making 


7.5 Results 

Overall, we would like to say that there are no major obstacles to using AW for 
long-term collaborative meetings. The technology is adequate for the task and the 
meetings we have described, and there are no intrinsic technical problems that 
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Figure 7.1 A screenshot from the house that we built together. 


cannot be solved which would prevent collaborative long-term uses. The 
usefulness of this technology for long-term meeting uses is of course very task 
dependent (see [21] for AW in text-mode, and [22] for a general review of 
computer-mediated communication and groups). 


7.5.1 Technical Problems 

The most common technical problem during the trial in relation to long-term use 
was audio (for the less technical aspects, see also section 7.5.4). This is very 
similar to what was found in the COVEN trial according to Anthony Steed, who 
was coordinator of COVEN (personal communication, 25.01.2001). The COVEN 
trial also found that audio problems were the main problem for effective 
collaborative meetings - though for COVEN, the stability and the functioning of 
the system also presented a persistent problem because COVEN was a pilot 
system. In our case, we had a tested and stable system, though the audio, which we 
had added to the system, remained a major problem (this also applies to Onlive 
Traveler, a stable system with audio discussed in Chapter 2). 

Interestingly, the perceived sense of how poorly the technical system worked 
seemed to diminish over the series of sessions. This occured even though the 
function of the VR system was roughly the same throughout the trial. It is likely 
that the user’s positive view of the system increases as the participants in the 
collaborating group get used to working together. There were a number of other 
technical problems, especially during the first sessions (logging in, joining each 
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other, and the like) but these were not persistent over time and they are minor 
compared to the audio problem. Finally, there were no “after” or “ill” effects. This 
might have been expected for a desktop system, but it is worth mentioning in the 
light of the attention that this issue has received in the literature on long-term uses. 


7.5.2 Presence and Copresence 

We know from previous studies that many factors affect presence and copresence 
in shared VEs (for example, [23, 24]. Here we can mention only some aspects 
related to this that came up during our trial. 

The system we used allowed both voice and text communication. Some answers 
in the questionnaires indicate that using text gives a better sense of presence. This 
may be because typing text is more related to communication with others via 
computers, while talking without seeing each other (apart from the representation 
of others) is more linked to speaking on the telephone. It may also be that there are 
never any problems in using text-communication, in contrast to audio 
communication. (We did not investigate this systematically enough to note changes 
over time.) 

As will be discussed below, structure is very important for successful 
collaboration in VEs. To express a clear goal is one of the things that enhance 
collaboration. However, a clear goal may also increase presence and copresence. 
Further, a well-defined goal may also increase the enjoyment of the session. From 
the questionnaires it emerged that the most positive aspect of collaborating in a VE 
is the sense of being able to do things - solve tasks and accomplish results - 
together. 

The conclusions here are not straightforward, however: goal orientation and task 
focus are important for good collaboration, and they seem to enhance copresence 
via a strong notion of doing things together. On the other hand, task focus reduces 
the focus of attention on others, which leads to a reduced sense of awareness of 
others - and perhaps less copresence (we return to this below). 


7.5.3 Avatar Appearance and Personal Characteristics 

Avatar appearance was not considered to be of importance to representing the 
personalities of the participants during the trial. This was despite the fact that the 
participants needed to orient themselves to the other avatars. Avatar appearance 
was only changed occasionally, mainly for fun, and little attention was paid to this. 
The result that avatar appearance is of minor importance in the course of several 
meetings has been noted in other studies (i.e., COVEN), and we can conclude that 
under the collaborative circumstances in our trial (when participants knew each 
other well), avatar appearance is of negligible importance. 

Even though avatar appearance is not important for collaboration, users show 
their personalities. It is clear that although initially, one’s “real” personality or 
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characteristics were “backgrounded”, personalities increasingly came into the 
foreground over the course of several online meetings: personal characteristics 
“shone through” over the course of time. For example, there were some complaints 
that things were taking too much time, some about not having a clear goal, and 
sometimes great enthusiasm, which seemed to relate clearly to our “real” 
characteristics. This point is striking because we were not aware of it during the 
meetings themselves, but this feature emerged clearly when reading the 
questionnaire responses, especially for later meetings. 


7.5.4 Communication 


7.5.4.1 Voice communication 

As mentioned, the modified version of AW used in our trial also allowed voice 
communication above the standard text and gesture possibilities. However, the 
person speaking needed to push down a button and only one person could speak at 
a time. This meant that listeners needed to get used to waiting for others when they 
were talking, the speaker did not get any verbal feedback while speaking, and strict 
turn-taking in the conversation was needed. There was no significant change in 
terms of ease in handling this - except that we adjusted to it over the course of 
time. 

Turn-taking problems have been widely discussed in research on computer- 
mediated-communication, but the key problem of communication when 
collaborating is that it is very difficult to express ideas, emotions and opinions in a 
VE: more effort is needed than in the real world. You have to be very explicit in 
what you say, and refer more clearly to the objects and actions that you mention 
than in the real world. This is something that you get used to - and even though the 
difficulty does not go away over time, participants improved in expressing 
themselves in a suitable way. 


7.5.4.2 Other modes of communication 

It is clear from the trials that text is used in two ways: (1) as a back-up when other 
(voice) communication fails (this includes both when audio fails and when you 
can’t say anything because the audio-channel is busy); (2) less commonly - to 
emphasize or clarify a word or a topic. 

The whisper function (text directed to one person only) was almost not used at 
all. (This is a feature which AW included by request from users of the system, and 
it is commonly used among regular users of the system.) The main purpose for 
using this feature was not to disturb participants with issues not relevant to others. 
Also gestures or extra functions for non-verbal avatar expression (like “happy”, 
“dance” and “wave” buttons which allow you to change avatar appearance in AW) 
were used very rarely. They were tried out initially, mainly for fun, but more or 
less died out over the course of the sessions. 
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7.5A.3 Awareness 

Awareness, as we know from other studies [25], affects collaboration, 
communication, presence and copresence. The lack of social cues reduces 
awareness. This topic has been much discussed by communication researchers, 
system developers and usability architects. 

In our trial sessions the participants were not really looking at each other. They 
had difficulties in knowing where the others were and keeping track of what they 
were doing. These difficulties also reduce the possibilities for coordination and 
collaboration (and coordination, as we shall see below, is very important in order 
to have good meetings in a VE). 

The difficulties of awareness are likely to remain while using the system over 
longer periods. However, they are considered less troublesome over time, 
consistent with the growing ease that has been mentioned in other places in this 
chapter. Skilled users seem to bother a lot less about a low degree of awareness, 
they have perhaps become used to not being aware of others. They may have 
become acclimatized to the notion that you cannot fiilly be aware of others and feel 
comfortable with that. The same applies to orientation and to the inconsistent or 
perceived inconsistent behavior of others (for example, flying, passing through 
objects, and so on). 


7.5.5 Collaboration 

We will focus here only one key issue: structuring the session. One of the main 
difficulties when collaborating is to get joint agreement in a group. It is more 
problematic to reach consensus, and discussions take a longer time (again this is 
task-dependent, and for structured meetings, CMC may also improve task 
performance in comparison to face-to-face meetings, see [22]). It is difficult to 
know when somebody would like to take a turn speaking or when they are satisfied 
with what has been said. Again, the lack of non-verbal cues is a key barrier. 

Our trial shows a need for structuring sessions. Even though we had intended the 
sessions to be unstructured, the need for greater coordination became increasingly 
obvious over the course of the sessions. Again, this finding is similar to the 
experience in COVEN. It is difficult to meet and collaborate in a VE, and efficient 
joint work can only be carried out with good coordination. 

As mentioned in the previous section, coordination is a problem because of 
awareness. Although awareness of others is not a problem as such, it becomes a 
problem in trying to organize your activity with others (does the other person want 
to do what you want to do? Which tasks are best divided into groups, and which 
can be done together?). These awareness issues in organizing activity in small 
groups are much more urgent than in real meetings. 

One way of structuring the collaboration is to keep down the number of 
participants working together. It seems that pairs often work better than groups of 
three or four. Several questionnaire responses mention that it was easiest to solve 
the tasks separately. Again, this feature is rather task dependent. 
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It was very valuable to have a leader organizing and driving the meetings 
forward. There are also indications that the lack of cues causes difficulties for 
leadership; for example in not providing the leader with adequate information. 

The aspects of collaboration that are clearly task dependent are problems with 
orientation and expressing ideas related to space. When discussing common 
objects there is a need to understand the space, for example, knowing where to put 
an object or in which direction to look. (This is perhaps the major difference in 
terms of collaboration with immersive VR systems, see [26]). There was some 
confusion and difficulty in understanding each other, but this occurred mainly in 
relation to building things together and exploring places together. In relation to 
objects, there were some problems with object ownership while building, but this 
was less of a problem than expected. 

Interestingly, when carrying out collaborative practical tasks, it seems that the 
best results were achieved when everyone was working on their own - and next 
best when working with one other person - with little awareness of the others. The 
implication that needs to be investigated is: which activities can be effectively 
carried out collaboratively? 


7.6 Conclusions 

One of the ideas guiding our research on long-term uses was the notion of 
“adaptation”: how do users “adapt” to the system in the sense that they change 
their behavior in order to become used to a new setting? Over the course of our 
sessions, the perceived sense of difficulties diminished over time. Since the system 
did not undergo any modifications (the conditions were the same throughout the 
trials), the changes were made by us, the users, by adaptation, i.e., changing our 
approach towards the technology and changing the interaction technique within the 
group to manage the social interaction better. 

This adaptation also took the form of increasingly coping with the need for 
greater coordination (and the lack thereof) - because of the difficulties of 
communication in terms of understanding each other’s ideas and responses. The 
lack of common understanding became more of a problem rather than less of one, 
at least in terms of our awareness of this. 

Further evidence for adaptation comes from the questionnaire result that the two 
novices felt less present than the two skilled or experienced users, they were more 
eager to modify the VE, more aware of its inconsistencies, and reported more 
frustration than the skilled/experienced users. 

The processes we have described can thus be understood as an adaptation to the 
characteristics of the system, to its constraints and possibilities, and to the 
increasingly shared expectations of the people using this system. 

This brings us to a conclusion that relates to the AW system, but which will also 
apply to other systems. It has been known for some time that there are problems 
with the AW system that have never been resolved, such as the crude building 
techniques, inconsistencies in orientation between different worlds, visibility 
problems, etc. (see the discussions by Bemie Roehl in VR News, a commercial 
newsletter, especially volumes 8:1 (1999) and 9:5 (2000)). The developers have 
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indicated that they intend to make a number of improvements, for example the 
possibility to change one’s avatar appearance, and upgrading the technical 
capabilities of the system. These changes do not address the issues around social 
interaction that we have identified here. 

The COVEN trial also mainly addressed the technical issues of the system, but 
the improvements arising from long-term usage over time still focused on usability 
in the technical sense. What our study demonstrates is that, for the long-term uses 
of shared VEs, it is necessary instead to make improvements that support and are 
guided by an understanding of sociability, and how social interaction changes over 
time. Further, these need to be taken into consideration earlier in the design phase 
rather than later, especially since the study of long-term uses shows that some 
technical problems are “backgrounded” over the course of time, while others - 
mainly social - are “foregrounded”. 

Examples of improvements that follow from this analysis might include: 

• a “join my perspective” function to allow better co-building and joint 
exploration; 

• an “I am waiting to speak” indicator in one’s avatar appearance to allow 
better turn-taking in small groups; 

• an “I am currently not available” indicator in one’s avatar appearance (this 
was also implemented in the software of the COVEN trial, with the avatar 
lying 'sleeping’ on the ground); 

• an “arrange this group in the optimal configuration to see each other” 
function to improve meetings or instruction/teaching sessions; 

• above all, more explicit guidelines for developers for long-term users 
concerning the conditions under which one needs to express one's 
thoughts or feelings more explicitly in a VE (as opposed to face-to-face), 
and how this makes a meeting or socializing situation difficult, or when 
users can easily “adapt” to these conditions. 

No doubt there could be many other suggestions for improvements, but at least we 
have made a start in exploring the social dynamics of the long-term uses of shared 
VEs. 
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Chapter 8 

Social Influence within Immersive 
Virtual Environments 

Jim Blascovich 


8.1 Introduction 

[Social psychology is] an attempt to understand and explain how the thought, 
feeling, and behavior of individuals are influenced by the actual imagined, or 
implied presence of others. Gordon Allport, 1954 [1] 

There is nothing so practical as a good theory. Kurt Lewin, 1951 [2] 

Nearly all social psychologists subscribe to Allport’s definition [1]. This broad 
definition has taken hold because it captures the universe of social influence 
processes, making clear that we can influence each other implicitly as well as 
explicitly. We do not need to experience the actual physical presence of others to 
influence them or for them to influence us. This long-held and well-validated 
assumption in social psychology provides a firm basis for assuming that social 
influence can occur within digital virtual and immersive virtual environments, 
whether the “others” present are computer agents or human avatars. Social 
psychologists also subscribe to Lewin’s dictum [2] regarding the value of theory 
for applications. 

Social psychologists realize that explicit theoretical models of social behavior, 
rather than the implicit notions we all carry as human beings, serve best in the 
development of practical applications involving social influence. All of us 
(including social psychologists) find it difficult to escape the tenets of our implicit, 
common sense theories of social behavior. And, sometimes our intuitions prove 
true. However, applications such as virtual social environments will generally 
prove more successful if based on sound explicated theory. 

Although social psychologists do not have a monopoly on understanding and 
facilitating social influence within digital immersive virtual environments (IVEs), 
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social psychologists can bring theoretical sophistication, methodological expertise, 
and empirical data to bear on the topic at hand, the social lives of avatars. 
Complementarily, by challenging social psychological theories, IVE developments 
can provide valuable and unique contexts for increasing the testing and theoretical 
sophistication of social psychological models, particularly those relevant to social 
influence. In turn, according to the Lewinian dictum, such increased theoretical 
sophistication can improve the practicality of digital IVE design and applications. 

This chapter expands on these notions. First, I explain our terminology so that 
semantics do not obfuscate the more important points I try to make. Second, I 
provide a detailed description and revision of our theoretical model of social 
influence within digital IVEs [3]. Third, I report research that has tested our model, 
provoked revisions, and added explanatory value to it for social psychology. 
Finally, I speculate on the long-term prospects for improvements in the social lives 
of avatars. 


8.2 Terminology 

Like many others, I have struggled to define explicitly and precisely important 
terms relevant to IVEs and social behavior. Part of this struggle stems from the 
inherent fuzziness or elusiveness of the constructs themselves, partly from the 
historical, albeit brief, use of some labels, and partly from conflicts between the 
connotations of common language labels (e.g., “presence”) and the desire for 
explicit precision. When the interchanges of interested theorists, researchers, and 
application designers devolve into statements such as, “I think presence is such and 
such...” “No! Presence is this and that...” little progress results as theoretically 
important differences in competing constructs underlying these arguments get lost 
in the competition for labels. Unfortunately, the relatively young history of 
computer-based IVEs has precluded much in the way of conventional agreement 
regarding the definitions of important terms. Furthermore, some semantic 
confusion exists because of the use of terms relevant to non-social IVEs. Here, I 
use various terms as convenient labels for key constructs. I ask the reader to keep 
in mind that the definitions are more important than the labels. 


8.2.1 Immersive Virtual Environments 

Fortunately, the oxymoron “virtual reality” and its cousin “immersive virtual 
reality” have all but disappeared from the lexicons of researchers. Scholars now 
speak of virtual environments without confounding “reality” and “virtual”. 

My colleagues and I define and use the term virtual environment (VE) to refer to 
an organization of sensory information that leads to perceptions of a synthetic 
environment as non-synthetic. Virtual environments may exist on the basis of 
organized information via any sensory modality. For example, audio recordings of 
musical performances and their rendition via playback equipment have provided 
virtual musical experiences for more than a century. More recently, the 




Social Influence within Immersive VEs 


129 


combination of recorded and rendered instrumental musical tracks allowing online 
vocal participation of users has created interactive virtual musical experiences (i.e., 
karaoke). Cartoons have provided entertaining multi-sensory virtual environments 
to generations of children (and adults). 

We use the term IVE to refer to VEs that organize sensory information in such a 
way as to create a psychological state in which the individual perceives himself or 
herself as existing within the VE (i.e., as being present or having “presence” in it; 
see below). 

Despite the newness of the label, IVEs have been around for quite some time. 
More than half a century ago, “3D” film represented initial attempts by the motion 
picture industry to provide visual immersive virtual experiences to the movie-going 
public. Modem theme parks abound in IVE experiences including many quite 
compelling ones. “Haunted houses” on Halloween night in the United States 
represent attempts to immerse “trick or treaters” within interactive IVEs that have 
been created by dedicated individuals for decades. The film. The Truman Show, 
shows this literal “hammer and nail” approach to IVEs at its logical extreme, 
depicting an elaborate IVE designed to contain the entire life of its main character 
and broadcast it for a television audience’s entertainment. 

Social psychologists have used “hammer and nail” IVEs for decades to increase 
the ecological validity and experimental impact of research scenarios. Famous, 
sometimes infamous, ones include Milgram’s “teaching laboratory” in which he 
conducted his classic obedience experiments [4]. Similarly, Zimbardo created his 
compelling “prison” in which he studied deindividuation processes [5, 6]. Even I 
created one, a “casino” in which I studied group influences on risk-taking among 
blackjack players [7, 8, 9]. 

In the past, interactive IVEs have proven relatively costly to constmct. The 
Truman Show depicted what it would take to develop an IVE without modem 
computer technology. For better or worse, we can now create and implement IVEs 
digitally using relatively inexpensive computer and tracking equipment [10]. The 
popular film. The Matrix represents a kind of The Truman Show showing digital 
IVE technology at its logical extreme. Articles in various professional journals 
describe constantly improving, ever cheaper digitally-based technologies for 
creating IVEs. 


8.2.2 Presence and Social Presence 

Much discussion has taken place in the literature, including this volume, regarding 
the concept of presence. Although a universally accepted definition of presence has 
not evolved, scholars and investigators agree that this psychological constmct is 
central to understanding human experience and behavior within VEs and IVEs 
[11]. As suggested above, we define presence as a psychological state in which the 
individual perceives himself or herself as existing within an environment. 

As outside observers, we have little difficulty understanding how an individual 
can experience presence within physical environments. If we think about it a bit, 
we understand that the individual may not experience total presence within the 
physical world at any particular point in time. People daydream, sleep, and imagine 
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themselves in other settings even while in a given physical environment. (As a 
university professor I have called on many students to answer questions who, while 
physically present, were not psychologically present.) So, when the competition for 
the individual’s presence is between the physical world, whose sensory 
impingement on the individual is physically direct, and a non-physical (i.e., 
imagined) world, whose sensory impingement on the individual is non-existent 
(i.e., imagined), we see little problem understanding presence. 

As outside observers, however, many have trouble understanding how an 
individual can experience presence within a virtual world. Here the competition 
for the individual’s presence is also between a world whose sensory impingement 
on the individual is physically direct (i.e., the synthetic or virtual one), and one 
whose sensory impingement is largely imagined (i.e., the physical one). If we 
consider the case of “hammer and nail” IVEs, we have little difficulty appreciating 
an individual’s presence in the immersive virtual world. It is only when the 
immersive virtual world is digital that we seem to have problems. In my view, 
these problems are more in the minds of scholars than in the minds of the actors 
themselves. The more immersive - and hence more compelling - the digital 
environment, the more presence individuals will experience. They will still not 
experience total presence within the digital IVE, just as they do not experience 
total presence within a physical environment. 

Parallel to presence, we define social presence as a psychological state in which 
the individual perceives himself or herself as existing within an interpersonal 
environment. Once again, in physical social worlds and “hammer and nail” 
immersive social worlds, we have little difficulty comprehending social presence. 
Social presence, like presence itself, need not be total. Individuals “tune out” 
others in their physical presence all the time. Furthermore, social presence, like 
presence itself, need not be physical. Allport’s definition makes clear that the 
presence of others may be implied or imagined. We believe that social presence 
differs little among digital IVEs, “hammer and nail” IVEs, and normal physical 
environments. 

The question, “How to measure presence?” has occupied many IVE researchers 
for some time. As with most psychological constructs, the measurement of 
presence may be subjective (i.e., based on self reports) or objective (i.e., behavioral 
or physiological). IVE researchers have focused for the most part on subjective 
techniques. This work appears methodologically valid. However, because presence 
does not necessarily require awareness, subjective assessment techniques are 
limited to those situations in which individuals are aware of their perceptions of 
presence or post hoc reconstructions of those perceptions. This problem can be 
overcome by employing online objective indicators of presence (see below). 


8.2.3 Agency 

Human representations within digital IVEs have highlighted issues of agency. We 
define agency as the extent to which individuals perceive virtual others as 
representations of real persons. I argue that agency is best conceived as a 
continuum, anchored on the low end by agents (i.e., cyborgs) perceived to be 
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completely controlled by non-human means, and on the high end by avatars 
perceived to completely controlled online by real humans. As a continuum, agency 
can be hybrid whereby individuals perceive virtual others to be partially controlled 
by non-human and human means. By convention, we reserve the term avatars for 
the high agency end of the continuum. 


8.3 A Threshold Model of Social Influence 

We have developed a threshold model of social influence to help us understand and 
explain the social life of avatars within digital IVEs [3], though we suspect the 
model applies to VEs as well. The development of this model has been dynamic, as 
is the case for most social psychological theories. Hence the model will likely 
change in the future. I describe its current instantiation here. 

Our threshold model assumes that social influence increases as a positive 
function of social presence. The more individuals perceive themselves as existing 
within interpersonal or social environments, the greater the social influence. 
According to our model, social presence varies as a function of several factors. 
Two of these are external to the target of the social influence and two are internal. 


8.3.1 Factors External to the Target of Social Influence 

The major external factors include agency and behavioral realism. We classify 
agency and behavioral realism as external factors because they are based on 
information, including sensory information, external to the individual. (Of course, 
this information leads to perceptions of agency and behavioral realism that one 
could argue are internal to the individual, but that argument is not critical here). 

Agency, as defined a few paragraphs above, is dependent on information 
available to the individual as to the type of control underlying representations of 
apparently human others within a shared social IVE. Behavioral realism refers to 
the degree to which virtual objects, including humans within digital IVEs, appear 
to behave as they would in the physical world. Figure 8.1 depicts the relationships 
among behavioral realism, agency, and social presence specified by our revised 
model. 

As depicted, social presence is a positive function of agency and behavioral 
realism. According to our model, to the extent that agency is high, social presence 
should increase. Independently, to the extent behavioral realism is high, social 
presence should also increase. We have depicted social presence as a linear 
function for ease of discussion, though this does not necessarily have to be the 
case. 

Because we typically build digital IVEs, including interpersonal ones, using 
visual media, we tend to think of realism in terms of photographic realism. 
Although important, photographic realism does not equate with behavioral realism 
and is, in fact, less important. Hence, the expressive realism of facial movements 
trumps photographic fidelity in terms of interpersonal communication [12] and 




132 


The Social Life of Avatars 






Social Influence within Immersive VEs 


133 


Figure 8.2 depicts the social influence component, a threshold, of our model. At 
some point along the social presence continuum, the threshold will be met or 
crossed, and interpersonal or social influences will begin to operate (i.e., above the 
threshold). The threshold has been depicted in Figure 8.2 as linear. Because social 
presence need not be linear, the social influence threshold need not be - but will 
remain - orthogonal to the social presence curve. 

Figure 8.2 depicts a steep social influence threshold. In this case, if agency is 
low, behavioral realism must be very high for the social influence threshold to be 
met. Conversely, if behavioral realism is low, agency must be high for the social 
influence threshold to be met. In other words, if the individual believes that the 
representations of virtual others in the digital IVE are agents, the behavioral 
realism of the representations must be relatively high for social influence to occur. 
If the individual believes that the representations of virtual others in the digital IVE 
are avatars, their behavioral realism need not be high. How high or low agency or 
behavioral realism must be in these cases is an empirical question. 


8.3.2 Factors Internal to the Target of Social Influence 

The major internal factors specified in our model are interpersonal self-relevance 
and response system. 

We argue that interpersonal self-relevance, the importance to the individual’s 
sense of self, of interpersonal interactions operates in basically same manner within 
synthetic or virtual social worlds as in natural social environments. In interpersonal 
contexts, the self-relevance of social interactions is a function of the importance of 
the interpersonal interactions and relationships involved vis-a-vis the actor’s goals 
and desires. For example, if the goal of an interaction involves open discussion of 
one’s deeply-held beliefs or attitudes we would expect interpersonal self-relevance 
to be high. On the other hand, if the goal of an interaction does not involve central 
or core aspects of an individual, we would expect interpersonal self-relevance to be 
lower. It is important to note that interpersonal self-relevance can be high for 
reasons other than an interaction goal involving core beliefs or attitudes. Affect can 
be involved. For example, the individual may want to perform in such a way as to 
be favorably evaluated by others present in an environment. Stronger emotions 
may be involved. For example, the nature of the relationship itself can affect 
interpersonal self-relevance. The goal of an interaction may be increased intimacy. 

As depicted in Figures 8.3A and 8.3B, the slope of the threshold of social 
influence varies as a function of interpersonal self-relevance in our model. If 
interpersonal self-relevance is high, the slope is steep (see 8.3A). If interpersonal 
self-relevance is low, the slope is shallow (see 8.3B). Hence, we would expect a 
steep slope for interactions relevant to individuals’ core beliefs, evaluation of task 
performance by others, and falling in love. We would expect a shallower slope for 
interactions irrelevant to core beliefs (e.g., a transaction such as a savings account 
withdrawal) or with a less self-relevant performance. 

The location of the social influence threshold also varies as a function of the 
response system used to assess the operation of social influence effects. The 
location of the threshold changes as the level of behavioral response system 
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changes. Hence, for low-level response systems such as unconscious reflexes, 
actions, and other processes, the social influence threshold is much lower than for 
high-level response systems, such as the ones involved in conscious and purposeful 
actions - for example, verbal communication. Figures 8.4A and 8.4B depict these 
relationships. 



Agent Avatar 

Agency 


Figure 8.3A Steep threshold of social influence. 



Agent Avatar 

Agency 


Figure 8.3B Shallow threshold of social influence. 
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Of course, to measure social influence effects on low-level responses, one must 
use measures that do not confound social influence effects with response system 
level. Consequently, one would not expect to assess social influences that occur on 
low-level systems with measures requiring high-level responses such as self- 
reports. An important implication here, as mentioned above, is that social influence 



Agent Avatar 

Agency 


Figure 8.4A Social influence threshold for high-level responses. 



Figure 8.4B Social influence threshold for low-level responses. 
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and its underlying antecedent or cause, social presence, may occur without the 
participant’s awareness and ability to report it. Hence, subjective self-report 
assessments of social presence may work for high-level response systems, but not 
for low-level ones. 


8.3.3 Moderating Factors 

Much social psychological research has been devoted to the Sisyphean task of 
investigating and understanding the myriad variables that influence social 
influence processes. These variables include organismic ones, such as age, sex, and 
health; demographic ones such as race, socio-economic status, and religion; 
dispositional ones such as personality traits, temperament, and intelligence; 
psychological ones such as opinions, beliefs, and attitudes; and social ones such as 
family, group membership, and social power. These are the “other things” in the 
usual caveat, “All other things being equal...”. And, of course, they never really 
are. 

We expect that these types of variables moderate social presence and 
consequently the threshold of social influence in IVEs in a way that is idiosyncratic 
to the individuals within them. For example, we might expect individuals who are 
dispositionally high in empathy or suggestibility to experience higher levels of 
social presence within an IVE than individuals low on these dispositions. 


8.4 Research 

We believe we are the first card-carrying social psychologists to explore the utility 
of digital IVE technology for conducting social psychological research, though 
other scientists [e.g., 13, 14] were certainly there first. Our initial foray involved 
attempts to replicate classic social influence effects in digital IVEs. We reasoned 
that if we could replicate these effects in digital IVEs, then IVE technology would 
hold great promise for experimental social psychology. We were not disappointed. 

However, we also learned that we could answer important theoretical questions 
regarding social influence that could not be answered easily - if at all - using 
traditional experimental techniques. Hence, our more recent forays have involved 
attempts to extend social psychological theory and research in new directions by 
using IVE technology to distinguish between low- and high-level social 
psychological processes and to perform experimental manipulations impossible 
without it. Before turning to examples of our research, we briefly describe our 
technology. 
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8.4.1 Immersive Virtual Environment Technology 

Readers should have some idea of our IVE technology in order to understand better 
the research described below. I will not attempt to describe this technology in great 
technical detail, but rather in a more general way. We provide more technical detail 
in other places [3, 10]. Also, our digital IVE technology represents only one of 
several immersive systems available to researchers. We see three major advantages 
of our technology: first, it costs far less than many other systems (i.e., currently 
US$25,000-40,000 per user depending on features); second, it permits multiple 
person interaction in digital IVEs within and between installations; third, it is 
relatively portable. 

Compelling digital IVEs derive from precise integration of hardware and 
software. The former includes computers, display devices, and tracking systems. 
The latter includes digital world development and scripting tools, rendering 
engines, and databases. Figure 8.5 depicts the basic components of our system. 
Andy Beall, the director of systems development at our Research Center for 
Virtual Environments and Behavior (RECVEB), has designed and refined this 
system. 



Figure 8.5 RECVEB IVET system schematic. 
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Users wear a headset that includes up to three physically small tracking devices 
that comprise the critical position and orientation tracking system (A in Figure 
8.5). A small LED on top of the headset allows an external video-based system, to 
track the user’s gross movements in three dimensions within the physical space in 
which the digital IVE experience takes place. An inertial system tracks the user’s 
head movements in three dimensions, and a small photo-optical system tracks the 
user’s eye movements. Additional tracking apparatus for body and limb 
movements as well as facial expressions can easily be added as research and 
rendering require. 

The resultant position and tracking information is continuously fed to a rendering 
computer (B in Figure 8.5) that identifies the appropriate visual information from 
the digital IVE to display back to the user via the headset’s stereoscopic video 
displays permitting three-dimensional views. The headset also contains a 
microphone and headphones for audio tracking and rendering. Typically, we 
maintain less than 40 milliseconds lag time between tracking and rendering, a lag 
imperceptible to users. Because lag time is largely a function of computer 
processing speeds for any given digital IVE, advances in these speeds will only 
decrease lag times and/or increase the complexity (i.e., visual fidelity) of digital 
IVEs in the future. 

In our system, users can move about the digital IVE freely by walking, running, 
kneeling, jumping, etc. We believe this functionality contributes greatly to users’ 
immersive experiences, and hence their sense of presence, because their kinesthetic 
and vestibular feedback systems operate consonantly with the digital VE. 


8.4.2 Replication of Classic Social Influence Effects within 
IVEs 


8.4.2.1 Social Facilitation 

We began replicating classic social influence by focusing on social 
facilitation/inhibition effects. Indeed, social facilitation/inhibition effects 
comprised the first report of experimental social psychological research in the 
literature [15] and have been a continuing topic of research in social psychology 
for more than a century [16]. 

Social facilitation and inhibition effects occur when an individual performs a task 
in the presence of others, for example, observers or an audience, and are assessed 
in comparison to task performance while alone. Facilitation, or increased 
performance, occurs when an individual performs a well-learned task with others 
present. Inhibition effects occur when an individual performs a novel, or not well- 
leamed task with others present. 

In our study [17], participants sat immersed in a digital IVE comprised of a small 
room containing a virtual computer monitor and chairs. We randomly assigned 
participants to learn one of two tasks while alone in the IVE: number 
categorization and pattern recognition. The former required participants to 
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categorize two 2-digit numbers, which they viewed on the virtual computer 
monitor, into one of two groups. This task required participants to infer or learn the 
classification rules via differential auditory feedback (i.e., a high pitched tone for a 
correct classification and a low pitched tone for an incorrect classification) they 
received after each trial. For the pattern recognition task, participants viewed a 5x5 
matrix of letters on which a five-letter word was highlighted. Participants were 
required to indicate whether the highlighted word appeared in the “correct” pattern 
(a right angle) no matter the pattern rotation. Participants performed blocks of 
twenty trials on the task to which they had been assigned until they achieved 80% 
correct responses on two consecutive blocks of trials. Subsequently, participants 
performed either the learned or the novel task either alone or in the presence of two 
virtual observers whom they were led to believe were either computer agents or 
human avatars. Figure 8.6 depicts a snapshot of the participant’s view with 
observers present. 



Figure 8.6 Social facilitation/inhibition IVE. 


The predicted social inhibition effects occurred. Participants performed 
significantly more poorly on a novel task in the presence of mere mute observers 
than alone. However, the predicted social facilitation effect did not occur. The lack 
of the latter finding did not discourage us because of the likely operation of ceiling 
effects, due to our 80% initial performance learning criteria that decreased the 
statistical power of our design (i.e., there was little room for increased 
performance). Importantly, the agent/avatar manipulation produced an effect. That 
is, social inhibition of participants’ performance occurred only when the observers 
were avatars. 
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8.4.2.1 Conformity 

We also investigated conformity as a classic social influence process using digital 
IVE technology. Conformity processes drive individuals to adhere to established 
group norms. We examined conformity in terms of risk-taking behavior. To do 
this, we developed a digital immersive virtual casino containing a blackjack table 
and dealer and other players. Figure 8.7 provides a snapshot view of the virtual 
casino. 



Figure 8.7 Casino IVE. 

In our study [18], participants played a round of blackjack alone with just the 
dealer in the casino. We stopped them after 20 hands. Next, they played a round of 
blackjack with two other players at the blackjack table. We manipulated betting 
norms by ensuring that the other players systematically bet more, less, or about the 
same compared to participants’ average bets in the solitary session. Again, we led 
participants to believe the other players were either agents or avatars. 

As expected, and as Figure 8.8 depicts, participants significantly increased and 
decreased their bets from their average bets in the first solitary round in the high 
and low group betting norm conditions, respectively. In other words, they 
conformed their betting behavior to that of the other players. Their betting average 
did not differ between the group and solitary rounds in the “same” norm condition. 

Together with the social inhibition data, these results contribute to our 
confidence that valid and valuable experimental social influence research can be 
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conducted using digital IVE technology. Of course, this should not have surprised 
us given that social psychologists have been creating IVEs for decades, just not 
digital ones. 



■ Low Bet Norm 

■ Same Bet Norm 
□ High Bet Norm 


Agent Avatar 


Figure 8.8 Average bets in the blackjack conformity study. 


8.4.3 Using Digital IVEs to Distinguish between Low- and 
High-Level Social Influence Processes 

Interestingly, although the conformity patterns held in the presence of both agents 
and avatars in the blackjack study, as Figure 8.8 depicts, participants in the avatar 
condition made higher average bets across all three conformity conditions in the 
presence of avatars compared to agents. Recall that in the social facilitation study, 
social inhibition effects occurred only in the condition employing avatars. 

Based on our threshold social influence model, the pattern of agent versus avatar 
effects in the conformity study suggests that conformity involves low-level social 
influence processes such that the underlying social presence occurs without the 
participant’s awareness and perhaps the ability to report it. 
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8.4.3.1 Social Comparison 

However, the increased average bets in the avatar conditions of the blackjack study 
suggest that a higher-level social influence process was operating in the avatar 
condition (see Figure 8.8). At every normative level (i.e., low, same, and high), 
participants in the avatar condition bet more on average than those in the agent 
condition. The most likely underlying high-level social influence process is another 
classic one, social-comparison. Social comparison drives individuals to compare 
themselves favorably to their peers on relevant factors, including behaviors [19]. 

Recreational risk-taking has long been regarded as a valued behavior in our 
culture [8] and social comparison theorists have demonstrated that individuals view 
themselves as more risky than their peers. Hence, social comparison theorists 
would explain our agent/avatar relatively simply. When participants can compare 
their bets to another real person (i.e., the avatar condition), they would tend to bet 
more than when they could not (i.e., the agent condition). Of course, conformity 
pressures still existed so that participants in the avatar condition conformed 
generally to the betting norms but kept their own bets a little above the norm. 
When there was no real person (i.e., the agent condition), social comparison 
processes were not activated. 


8.4.3.2 Proxemics 


We decided to examine proxemics, the study of interpersonal distance and personal 
space, to explore low- and high-level social influence processes more directly 
using digital IVEs. We expected that proxemics would involve low-level social 
influence processes and that we should see little if any differences between 
participants interacting with agents and avatars. However, we also predicted that 
we would observe independent effects of behavioral realism on proxemics 
behaviors. 

In the first study [20], we placed participants in a digital IVE room about 5 
meters square that contained a representation of a single agent. Each agent wore a 
sweatshirt with a name and number printed on the front and back of his jersey. We 
asked participants to approach the agent and learn its name and number for a later 
recall task. We systematically varied the behavioral realism of the agents. In the 
low behavioral realism condition, agents simply stood manikin-like with closed 
eyes. In the high behavioral realism condition, agents watched participants 
approach with open and naturally blinking eyes, maintaining mutual gaze with 
them (except when participants walked behind the plane parallel to the agents’ 
shoulders). Participants performed many trials of this task, each time approaching a 
different agent. 

Unknown to participants, we tracked their movements within the digital IVE. 
These data were automatically collected via our tracking system (see above). 
Analyses revealed significant differences as a function of behavioral realism. 
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Participants maintained more space around the agent in the high compared to the 
low behavioral realism condition. 

Figure 8.9 presents typical examples of participants’ paths in the two agency 
conditions. We have since replicated this study varying not only behavioral realism 
but also agency. We found, as expected, that agency does not mediate social 
influence effects on proxemics. 

These and the serendipitous social comparison findings described above increase 
our confidence in the value of our model and of digital IVE technology for helping 
us to distinguish low- from high-level social influence processes. 


High Sehaviorei Realisjm 



Low Behavioral Reali^ 



-1 0 1 ~ 2 3 4 


Figure 8.9 Proxemics data (room boundaries in meters). 


8.5 Long-term Prospects of Digital IVE 
Technology for Research on the Social Lives of 
Avatars 


The potential benefits of digital IVE tools for carrying out traditional experimental 
social psychological research are clearly quite high, as we have argued and 
demonstrated here and elsewhere [3]. In addition, we have learned that the value of 
digital IVE technology is potentially much greater than its utility as a powerful 
methodological tool to employ in traditional ways (i.e., merely replicating physical 
IVEs with digital ones). 

Not only can we now more easily distinguish low- from high-level social 
influence processes and inform social psychological theory as well as the 
development of social IVEs in the future, but we also can implement experimental 
manipulations nearly impossible to accomplish otherwise. For example, we can 
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manipulate social identity within digital IVEs by experimentally varying the 
organismic and demographic characteristics of the participant, such as gender, 
race, physical size, age, etc., allowing us to randomly assign participants to social 
identity conditions. We can and have [21] allowed participants to interact with 
themselves to explore theoretical notions of self heretofore untestable. 

Designing digital IVEs has challenged us in other unexpected ways. We learned 
that digital IVEs could not only help us to perform important but somewhat 
mundane and yeoman research more easily and efficiently, but they could also help 
us to tease apart differences among Allporf s social presence distinctions, i.e., 
actual, implied, and imagined. Social psychologists have paid scant attention to 
these distinctions, assuming that the same social influence processes underlie all 
three. Perhaps this assumption is correct. Perhaps it is not. However, digital IVE 
technology provides the means to explore that question empirically. 

We believe that the ever-expanding utility of digital IVEs as a methodological 
tool foreshadows substantial strides in social psychological research and theory. 
And we expect we will continue to be challenged by it in unexpected ways. By 
virtue of good theory guiding practical development of better social IVE 
applications in the future, the social lives of avatars can only improve. 
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Chapter 9 

Meeting People Virtually: Experiments 
in Shared Virtual Environments 

Mel Slater and Anthony Steed 


9.1 Introduction 

The BBC TV series Dr Who popularized the race of beings known as the “Daleks” 
[1]. A Dalek is a creature that is completely encased in a metallic shell, through 
which it can slide over the ground only over certain types of flat, smooth and 
electrically conducting surfaces. It has three limbs, one used as an eye-piece 
delivering a relatively small field of view, another which is a weapon, and a third 
which acts as an end-effector for the manipulation of objects. There is no direct 
evidence about Daleks’ auditory capabilities. However, this is unlikely to be good, 
since Daleks have a habit of repeating most things they say several times. Daleks 
tend to shout rather than talk, another indication of poor auditory capability and a 
lossy information channel. This evolutionary development was the result of a war- 
induced nuclear holocaust, thousands of years in the past. 

Daleks experience the world in only one particular way - as Daleks. They 
typically cannot survive outside of their metal casing. Humans on the other hand 
have a very different evolutionary history. They have relatively flexible bodies, 
including hands capable of very precise manipulations and gross manoeuvres of 
great strength. Hands are also capable of expressing intended emotion through 
touch and may also be used as weapons. Humans have feet capable of transport of 
the body through widely varying types of terrain. Their visual and auditory systems 
are certainly far superior to the Daleks. Humans have relatively sophisticated and 
powerful communication capabilities through speech - not only for expressing 
ideas, but also for the expression of deep emotions as in music - quite unlike the 
limited command-only mode of speech characteristic of Daleks. 

This essay is about an increasing tendency of humans to embody themselves in 
worlds where they have Dalek-like capabilities, and where they meet and interact 
with others similarly engaged. Such worlds are called “Collaborative Virtual 
Environments” (CVEs) and also “Shared Virtual Environments” - the latter 
perhaps a more accurate term since an environment itself cannot be 
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“collaborative”, and begs the question of whether the individuals in it can actually 
collaborate. 

This chapter reviews some experiments that explore what happens when humans 
do meet one another in such Dalek-like situations. How do they interact with one 
another? To what extent do social conventions hold? Can people become shy or 
angry or generally experience social discomfort? If the group of people so meeting 
have a task to do in common, does a leader emerge, and is leadership capability 
associated with their computational resources? This chapter considers the question 
of what happens when real people meet other real people, but through the medium 
of virtual space and virtual personal embodiment. 

There is a subsidiary question: suppose the beings that are met by a human in a 
shared virtual environment are not actually human, but virtual characters controlled 
by a computer program. What happens when such real people meet other virtual 
people - virtual not just in their representation, but in their state of being? The 
same questions can be asked: do real people experience, for example, social 
discomfort if a virtual being does something socially unpleasant to them, or, on the 
other hand, do they experience pleasure when a virtual person is “nice” to a real 
person? 

It is likely to be the case that more and more interaction between people will take 
place through encounters where the shared space is entirely virtual. The people 
involved may be physically separated across thousands of miles. The normal 
capabilities of people (visual, auditory, locomotory, kinesthetic and tactile) may be 
grossly impaired compared to reality, and there may even be doubt about whether 
some of those involved are “real people” or just “virtual people” with an 
automatically determined behaviour. It is important to understand how the 
technology influences the likely social behavior, and the consequences of this for 
the types of social interaction that might develop. 

The chapter reports a series of experimental studies that focus on these questions. 
The scope of the first set of studies was that of real people meeting in a virtual 
setting - where the real people had never met in reality, and only interacted 
through the virtual medium. The people met in order to carry out a task designed 
by the experimenters. A further series of studies examines the reaction of people to 
audiences in a virtual setting, where those audiences are completely virtual. The 
purpose was to study how the response of the real people varies with the response 
of the virtual audience. 

In Section 9.2 the scenario for the first group of experiments is described. Two 
main issues are considered: the social comfort or discomfort experienced by the 
participants (Section 9.3) and the emergence of a task leader (Section 9.4). In 
Section 9.5 a two-person interaction study is described that focused on the 
relationship between quality of communication and gaze timing between two 
people. In Section 9.6 three-person interaction is considered in the context of an 
experiment involving two actors learning a short play in the presence of a director. 
In Section 9.7 there is a report of a series of studies on reactions to a virtual 
audience. Conclusions are presented in Section 9.8. 
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9.2 Experimental Scenario for Three Shared VE 
Experiments 


9.2.1 Organization 

Three different experiments have been carried out, each involving several groups 
of three people each. Different groups were used for the different experiments. The 
first experiment was carried out at University College London (UCL), and 
involved each of 10 groups meeting first virtually to carry out a task, and then 
continuing the same task meeting together physically [2]. The second experiment 
had three participants in each of 4 groups involved in widely different locations - 
one at UCL, the other at the University of Nottingham, and the third at the Swedish 
Institute of Computer Science (SICS) [3]. The groups carried out the same task as 
in the original experiment, but these people never met physically. The final set of 
experiments consisted of 20 groups of three, and were located at UCL, Nottingham 
and at Integrated Information Systems (IIS) in Athens, Greece [4]. Again, of 
course, the people never met to continue the same task for real. The common 
features between all three experiments are described next. 

Each subject was assigned a color (Red, Green, or Blue) as their name, and this 
color also signified their appearance. The person named “Red” had a red avatar, 
and similarly for the other two colors. No real names were ever used - the three 
people only referred to themselves and to others as Red, Green or Blue. 

Wherever located, each subject was assigned to a “minder” responsible for taking 
them through the various stages of the experiment. They were first introduced to 
the system that they would be using. This was either a desktop system or an 
immersive system with a head-mounted display (HMD). The virtual environment 
displayed was actually a rendition of a real laboratory at UCL. 

The first task was for each individual to learn to move through the environment. 
Then, at a signal from the overall controller of the experiment to the various 
minders, each subject was given a sheet describing the task to be performed. Then, 
on another signal, they were invited to put on earphones, and to introduce 
themselves to one another, only referring to themselves and to the others by their 
color. 

The task was to locate a room which had sheets of paper which had been stuck 
around the walls. The sheets each had several words in a column, each word 
preceded by a number. The words across all sheets with the same prefix number 
belonged to a “saying”. For example, “A critic is a man who knows the way but 
can’t drive a car” would be split up across several sheets, but each word belonging 
to this sentence had a common prefix number. The task was first to figure out that 
this was the puzzle to be solved, and second to unscramble as many of these 
sayings as possible. 

The total scene consisted of approximately 3500 polygons and ran at a frame rate 
of no less than 20 Hz on any of the machines, including those machines generating 
stereo graphics for HMDs. 
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9.2.2 Special Features of the UCL Experiment 

In the first experiment, the one entirely located at UCL, the subjects carried out the 
task in the virtual environment for 15 minutes. They then stopped to answer 
questionnaires individually. Then each subject was requested to put on a waistcoat 
corresponding to their color, and they were brought together to meet for real. They 
carried on the task for a further 15 minutes in the real laboratory on which the 
virtual laboratory had been modelled. (The papers were stuck around the walls in 
the same arrangement as they had been virtually). After this real session, again the 
subjects answered their questionnaires, and then met together with the 
experimenter for a de-briefmg session. 

There was a further secret instruction given to the Green subject in the virtual 
part only of this experiment - this was to get in the way of the Red person as much 
as possible - to try to be always in Red’s line of vision. The purpose was to 
examine the degree of social discomfort generated in the group as a result of this 
behaviour. 

The Red person was using an immersive system employing a 2 tracker Polhemus 
Fastrak system. Virtual Research VR4 helmet and a 3D mouse with 5 buttons. The 
Green and Blue subjects both used desktop systems. The virtual reality software 
used throughout was the Distributed Interactive Virtual Environments (DIVE) 
toolkit version 3.1 from SICS [5, 6]. A DIVE avatar was used for each of the 
participants, and was the same for each apart from the colour. See the left-hand 
humanoid figure in Figure 9.1. The sound system used was the Robust-Audio Tool 
(RAT) v.3.023 [7]. 

9.2.3 Special Features of the UCL-Nottingham-SICS 
Experiment 


In this experiment no person was immersed - though there were differences 
between the machines used - this would help to find whether immersion accounted 
for the leadership effect in the earlier experiment or whether it was more to do with 
the speed of the machine. 

As in the first experiment, none of the avatars had any expressiveness or limb 
movements. In the first experiment the three avatars used were identical apart from 
color. In this second experiment, in order to examine the potential impact of avatar 
appearance, one avatar, the one at SICS, was different to the other two. The Red 
and Blue subjects had basic male caricature avatars (see the centre humanoid in 
Figure 9.1). The third (Green, at SICS) was represented in a more realistic manner 
(right-hand humanoid in Figure 9.1). It was also smaller than the other two. The 
SICS (Green) person would see two other avatars of the same cartoon-like 
appearance apart from color; whereas each of the other two participants would see 
one avatar looking very realistic and the other cartoon-like. None of the subjects 
had any idea of how they themselves were portrayed. 
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When the Red or Blue people spoke, a wave-like emanation would be emitted from 
their avatars. This was not the case for Green - another difference between the 
Green person and the other two. As in the first experiment, DIVE and RAT were 
used to implement the shared environment. 



Figure 9.1 Avatars used in the UCL and UCL-Nottingham-SICS experiments. 


9.2.4 Special Features of the UCL-Nottingham-IIS 
Experiment 

For this experiment the participant at UCL was using an immersive system 
employing a two tracker Polhemus Fastrak system. Virtual Research VR4 helmet 
and a 3D mouse with five buttons. The other two participants were using a desktop 
system using the full 1280 x 1024 screen and a standard 2D mouse with three 
buttons. Figure 9.2 shows three participants examining one of the posters. 

The experiment was implemented with the dVS/dVISE 5.0 system from Division 
Ltd. (now renamed DIVISION and sold by Parametric Technology Corporation). 
An audio server ran on the same machine and sites were connected using the point 
to point connection capability in RAT. 
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Figure 9.2 Red, Green and Blue examining one of the posters. 


9.3 Social Discomfort in Shared VEs 

There are two sources of results regarding social discomfort: detailed statistical 
analyses of the questionnaire responses, and reports of the debriefing sessions held 
with the subjects at the conclusion of the experimental trial. Detailed results of the 
statistical analyses of the questionnaire results are available in [2]. Only the main 
conclusions are presented here. 


9.3.1 Comparing Virtual to Real 

In the first, UCL-based, experiment it was possible to compare responses given 
after the real and virtual sessions. Of course, it is impossible to know if the 
different responses were simply due to a time effect (the real sessions were always 
conducted after the virtual). More experiments are needed where the order of 
presentation would be reversed. 

The questionnaire (the same one given after each of the virtual and real sessions) 
explored various aspects of accord between the group members, and the ones listed 
below show that there were significantly different responses between the virtual 
and real sessions. In each case the virtual session was experienced as the one with 
the greater level of social discomfort. This is shown in measures of: 

• enjoyment of working with others in the group (compared to a previous 
enjoyable group experience). 
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• The degree of isolation of members from the group. 

• The extent of feelings of comfort with the other group members. 

• The degree to which group members cooperated with one another. 

When all “accord” variables were used to construct one overall measure of group 
accord, there was an overwhelmingly significant difference between the virtual and 
real sessions, with the real session resulting in a much greater level of overall 
accord than the virtual. 

The reason for these differences might not solely be due to the nature of a virtual 
compared to a real encounter. Another factor that was different between the two 
sessions was that in the virtual session Green was asked to “monitor” Red, while 
this was not the case in the real session. However, when the responses from the 
individuals are examined, there are no significant differences between Red, Green 
and Blue for any of the “accord” variables. 


9.3.2 The Impact of Interference 

Although it did not show up in the questionnaire responses, there was a clear 
impact of the monitoring task, in the first UCL only experiment - where Green was 
given the additional instruction to get in Red’s way. 

In three of the ten groups there was an impact of this additional task by Green. In 
one group Red formed the opinion that Green was being deliberately destructive. 
All three members of this group (Red and Green male. Blue female) had a high 
sense of what they described as “paranoia” during the virtual session, which 
completely disappeared when they met for real. 

In another group Red did notice something different - but interpreted this as 
something being wrong with the avatar configurations. She said that “Everyone 
was supposed to be looking at the walls, but Green was looking at me”. In this 
same group. Green reported that “I felt I wasn’t being me” and “What on earth 
were they thinking of me?” She imagined that the other two were “wondering why 
I am doing this”. 

In another group the major impact was on Blue, who thought that Red and Green 
“know where they are -1 felt excluded”. In other words Blue noticed that Red and 
Green seemed to be close to one another most of the time, and Blue was left out of 
this. 

One thing reported by almost all Green subjects was the difficulty of carrying out 
the monitoring task successfully. Green could not judge Red’s field of view. There 
was no “eye contact” in any meaningful sense, no “exchange of glances”. More 
generally this lack of feedback about body movements and body language from the 
avatars was mentioned by several people. 


9.3.3 Experiencing Avatars 

A major issue explored in the debriefings was the relationship of the people to their 
avatars. Individuals were respectful of the avatars of the other people, and tried to 
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avoid carrying out actions that would cause distress or that would be impossible in 
real life. This process of being mindful of the avatars of others was not expected; 
they were taken seriously in spite of all their shortcomings. 

The analysis of the post-experimental group discussion revealed a surprising 
degree of attachment and relationship towards the avatars. Although, except by 
inference, the individuals were not aware of the appearance of their own body, they 
seemed generally to respect the avatars of others, trying to avoid passing through 
them, and sometimes apologizing when they did so. Here we can mention again 
that these were very simple avatars, with limited movement and no capability for 
any kind of emotional expression. 

The behaviors that were observed or reported included various types of 
embarrassment or social discomfort on accidentally walking through another 
person’s avatar. This behavior was variously described as rude, frightening, 
uncomfortable, disconcerting, or just plain “bad”. Some group members were 
observed to apologise when they walked through another person. 

Recall that in the second study (UCL-Nottingham-SICS) two of the avatars (Red 
and Blue) were cartoon-like, and the third looked very realistic, although it had no 
more functionality than the other two. These differing avatar representations 
clearly had an impact on the relationships that developed. In one group the realistic 
avatar was experienced as “scary” - it looked real but was like a “zombie” since 
the degree of realism raised expectations about corresponding realistic movements 
which, of course, did not occur. In another group, the major impact of the avatar 
representations was on the subject who saw the other two avatars as cartoons. This 
subject formed the belief that the cartoon-like avatars were not embodying real 
people but were “robots”, and as a result she cut down her communication with 
them. It was only when they laughed (“something a robot cannot do”) that she 
believed they were real, and initiated a greater level of communication. 

In another group, one of the subjects formed the opinion that the realistic avatar 
embodied an important person - for example, a senior person, or a managing 
director, someone with more privilege. This led to a further belief that this person 
would understand more about the system as a whole - simply because the body 
was more sophisticated than that of the other avatar that could be seen. 

In a final group there was no reported impact of the body representations. 
However, the UCL-based subject believed, because of poor audio in that session, 
that the other two subjects were talking in German, and possibly talking about him, 
and laughing at him. There was no particular reason for him to form this belief (of 
course, one of the subjects was in Nottingham and the other in Sweden, their 
common language being English). However, this became a very powerful belief, 
and he reported that this had a similar impact to what he would have felt in real 
life. “You become a minority” was his closing remark. 

In all three experiments, aspects of avatar functionality and behavior were 
reported as hindering the social collaboration. The lack of emotional 
expressiveness (except through voice) was overwhelmingly mentioned. The lack of 
eyecontact, body language, and even the ability to point at a reference object (one 
of the sheets of paper on the wall) were important drawbacks. Subjects seated at 
the screens could be seen to point frequently with their real hand at an item on the 
virtual wall, while speaking about it to the others. This seems to be an irrepressible 
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behavior, even though the people are intellectually aware that their real gesture 
cannot possibly be seen by the other people in the group. Perhaps this is a 
significant argument for an immersive system, where participants really do point 
with their real hands which can be seen by the others involved. The impact of 
immersion on a particular aspect of social relationships is discussed in the next 
section. 


9.4 Emergence of a Leader in Shared VEs 

The nature of the task required collaboration for a successful resolution. Papers 
were distributed around the walls of a room, and the subjects were unable to take 
notes (even in the real meeting). Ideally each subject would stand in a different part 
of the room, and relate the words and numbers they could see to the group as a 
whole. The spatial arrangement of the group and the method of working would 
often come about through one person suggesting ideas to the others. This person 
would be perceived as leader and would also do most of the talking. In the 
questionnaire used for each experiment, each person was asked to rate the three 
people (including a self-assessment) on the extent to which they emerged as leader, 
and the extent to which they did the most talking. These two scores (leadership and 
talkativeness) were very highly correlated for the two experiments, with enough 
subjects (the first, 30 people, and the third, 60 people) to make the correlations 
meaningful. For example, in the third experiment, the correlation was over 0.9. 

In the first experiment one person was immersed and the other two were on 
desktop systems. Whether averaging the leadership and talkativeness scores, or 
counting the number of times that the immersed person was chosen as leader or 
chosen as the most talkative, there was a statistically highly significant bias 
towards the immersed person. This result was not the case for that person when the 
groups met for real. It should be noted at this point that the subjects did not know 
about the systems used by their colleagues, and would have had no idea that one of 
them was immersed. 

In the second experiment none were immersed, although there were differences 
in speed and performance of the various machines involved. However, no 
particular leadership pattern emerged. 

In the third experiment, again, one person was immersed, but the result here is 
more complex. On the raw data for leadership and talkativeness counts and scores 
there was no particular pattern in favor of the immersed person. The only 
conclusion from the raw data in this regard was that the person in Greece was 
almost never rated as leader. This was probably because the network connection 
was 5-10 times slower than that between London and Nottingham, and, of course, 
there was also the question of language difference. 

A detailed statistical analysis, however, revealed a more interesting picture. In 
the third experiment, the subjects answered a questionnaire concerned with their 
general social anxiety, called the Interaction Anxiousness Scale [8]. Generally a 
higher score means a higher level of general social anxiety, and average scores for 
the general population and also for those seeking counseling or psychiatric help are 
known. The hypothesis was that there would be a negative correlation between 
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social anxiety and degree of leadership. This was indeed the case, with higher 
levels of social anxiety associated with a lower leadership rating. Now when social 
anxiety was factored out of the relationship between immersion and leadership 
score, an interesting result was found: for males, there was a clearly greater 
leadership score for the immersed subjects than for the non-immersed ones. For 
females, the immersed and non-immersed subjects had almost the same leadership 
rating. The reason for this is unknown: possible reasons include the general 
discomfort and weight of the HMD, and the fact that the use of the HMD system is 
certainly more intimidating than the familiar activity of just looking at a screen. 
Alternatively, it could just be a chance result, and clearly more studies are needed 
to reach any firm conclusions. Certainly, being able to compare behavior in the real 
and the virtual environment plays a crucial part in understanding issues around 
leadership, and this was not possible in the second and third experiments. 


9.5 The Impact of Gaze Direction 

A major factor that inhibited the development of collaboration between people in 
the 3-person experiments was the lack of any eye-gaze contact between people. 
The eyes of the avatars were black dots in a fixed position. To look in a particular 
direction, the entire avatar would have to be rotated towards that direction - there 
was no independent head and body movement - in fact making the situation worse 
than what is possible for Daleks! In normal social discourse, eye gaze direction 
plays a major role in communication. Humans, uniquely amongst primates, 
repeatedly look into each other’s eyes during normal social discourse [9]. Alone 
amongst all primates the sclera, the white area around the iris, emphasises gaze 
direction to others. Gaze may be used to assess the possible multifarious responses 
of others, to check for listening and understanding, to infer others’ intentions, and 
to assess the likely onset of a change in turn during a conversation [10, 11]. In the 
3-person experiments described in the previous sections, the avatars had black dots 
as eyes, and the inability to move eye gaze direction independently of the whole 
body was believed by the researchers, and expressed by the subjects, to be a major 
impediment to effective communication. 

In order to assess directly the importance of gaze direction, Garau et al. [12] 
carried out an experiment where groups of two people carried out a negotiation 
role-playing exercise under a number of different conditions. The communication 
was via a video tunnel that displayed either an avatar head-and-shoulders view or 
live video of the two subjects to each other. This was a between-groups design 
with 50 pairs of individuals randomly assigned to one of four groups: (1) an audio 
only condition (13 pairs); (2) a random gaze condition where the head and gaze 
direction of the avatars were random within constraints to avoid impossible 
configurations (12 pairs); (3) an inferred gaze condition where the head was 
tracked but the eye gaze direction was inferred from the conversational flow 
between the two subjects (13 pairs); and (4) a video condition where live video was 
used rather than avatars (12 pairs). Conditions (1) and (4) were there as base-lines 
for comparison, and the major focus of interest was (3). The inferred gaze 
condition assigned probability distributions to gaze direction timing based on turn- 
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taking - in other words, on pauses in speech, where the distribution means were 
taken from social-psychological data on turn-taking [10, 11]. Figure 9.3 shows the 
male and female avatars used in conditions (2) and (3). 




Figure 9.3 The male and female avatars showing different eye gaze directions. 


The response variables were questionnaire-elicited measures of the degree to 
which the negotiation was experienced as being like a real face-to-face 
conversation, the participants’ degree of involvement in the conversation, the 
degree of copresence, and the extent to which the conversation, including 
evaluation of the partner, was enjoyed. Overall one may regard these four 
responses as an indicator of the quality of communication. 

As would be expected the video condition produced the most favorable mean 
results on all four conditions. However, for the face-to-face and involvement 
responses there was no significant difference between the video and inferred gaze 
conditions, which were significantly higher than the other two responses. The 
random gaze avatar condition always produced the least favorable results. Only in 
the case of the copresence response was the inferred gaze condition overtaken by 
the audio condition (though the difference is not statistically significant). These 
results suggest that it is not the presence of an avatar that is important, but rather 
the avatar must be capable of expressing behaviour that is meaningful in 
relationship to the conversation, in this case eye gaze. One drawback of this study 










Meeting People Virtually 


157 


is that the influence of eye gaze cannot directly be differentiated from that of head 
direction, since in the random condition, both were randomly generated, and in the 
inferred gaze condition, head direction was tracked while eye gaze was inferred. 
Hence although we know that gaze direction is important, we do not know whether 
the results can be attributed to head gaze or eye gaze or a mixture of both. A 
subsequent experiment will fill in this gap. 


9.6 Acting Rehearsals in a Virtual Environment 

Acting rehearsals provide one of the toughest tests for the utility of shared virtual 
environments. In the case of acting it is essential that actors communicate with one 
another with their full armory of emotional expression and non-verbal behavior. 
Actors learn their roles not by learning lines but by getting into the minds of the 
characters that they portray and communicating their thoughts and emotional state 
to the others involved, including, of course, the audience. For example, the strategy 
used by actress Tania Vujoshevich in her rehearsal of a role in a Bertold Brecht 
play was described as follows [13]: “In the beginning, during rehearsals, I would 
replace Fritz’s name with my fiance’s, and pretend I would never see him again,” 
Vujoshevich says. By personalizing the situation, she found she was able to convey 
the pain and sadness and fear the character was feeling. 

Actors feed off one another’s expressions, they do not act in isolation. The ability 
to express full non-verbal behavior is therefore crucial in maintaining the overall 
performance rather than the benefit of each actor in isolation. 

The idea of acting rehearsals with participants at physically remote locations but 
interacting through a shared VE system was suggested as a scenario by Dr Ian 
Childs, Head of Research at the BBC. If successful it could have a number of 
economic and logistic advantages. This was undertaken as one of the projects 
sponsored by the Digital Virtual Centre of Excellence in Broadcast Multimedia, an 
initiative in the UK that was originally sponsored by the government. However, 
even before the research started, this seemed an impossible task since it was known 
from the experiment work described above that participants in a VE do not have 
the ability for anything like full emotional expression, and that this paucity of 
expression is detrimental to communication. Nevertheless, we had some clues: that 
eye contact and facial expression were essential. Thus we developed desktop-based 
shared VE system and conducted an experiment during the summer of 1999 [14]. 

A script for a play of about five minutes was created based on a training 
programme previously used at the UK’s National Film and Television School. The 
script involved a man and a woman involved in a domestic kitchen scene at 
breakfast. Three pairs of actors took part, and two directors were involved. Each 
pair of actors were assigned to one director, with whom they met virtually on four 
occasions over a two week period. Each such meeting lasted approximately 30 
minutes, and was followed by a questionnaire and debriefing session. On a fifth 
occasion, the two actors and director in each group met again for about 30 minutes 
virtually - and then met physically for about 10 minutes in a rehearsal space. This 
was followed immediately by a performance in front of a live audience. Before the 
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first meeting in the rehearsal space, none of the actors had met each other or the 
directors physically at all. 

The virtual rehearsal system was implemented using the DIVE 3.3x platform [6]. 
Desktop displays were used by all involved. The actors were embodied as avatars 
who could see and talk to each other, but the director was a disembodied voice 
with no visual representation. The frame rate was at least 20 frames per second 
throughout. 

Although the script was emotionally neutral, and remained substantially the same 
for each rehearsal, the actors were required by the director to convey a different 
sub-text on different occasions. For example, it was early morning and the night 
before one of them had crashed their brand new car. Or, they had been sharing the 
same house together for some time as friends but had been intimate for the first 
time the night before. On the fourth rehearsal they chose a subtext themselves - 
usually a variation on one of those rehearsed beforehand. This final subtext was the 
one to be rehearsed before the live audience. 

Male and female avatars were constructed for the actors. Their arm movements 
could be controlled in a limited way (up or down), they could glide across the floor 
(Dalek-like), they could turn their heads, and they could make facial expressions. 
All of this was accomplished with keyboard, slider and mouse control - facial 
expressions in particular were controlled by simple strokes on a smiley face. Facial 
animation was based on a muscle model [15]. The facial animation interface 
allowed five major expressions, and mixtures of these, to be displayed: happiness, 
anger, surprise, sadness and a neutral expression. The interface was designed to 
enable the actor to specify an intensity for any one of the four major expressions 
and also to allow for asymmetric facial expressions. 



Figure 9.4 An example of the male avatar, showing a possible body disposition 
and facial expression. The indented face in the top left comer is a mirror by which 
the actor could judge the suitability of his own facial expression. 
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After each of the first four sessions, a questionnaire was issued to the actors. This 
was concerned with four main issues, each elicited by one or more questions on a 
7-point scale with 1 indicating a low degree and 7 a high degree of the 
corresponding attribute. The following questions were used for each of the four 
issues: 

1. Similarity to real rehearsals. 

• Think about your recent rehearsals in real life. To what extent did you feel 
that the rehearsal you have just experienced was similar to those 
rehearsals? 

2. Copresence 

This is the extent to which the computer becomes transparent and there is a sense 
of being with the other people in the VE. It was elicited by two questions: 

• In the rehearsal you have just experienced, to what extent did you feel that 
the other actor was in the space with you? 

• When you think back about your experience, do you remember this as 
more like just interacting with a computer or with other people? 

3. Presence 

This is the extent to which the individual is able to suspend disbelief and have a 
sense of being in the virtual space. It was elicited by two questions: 

• To what extent did you have the sense of being in the rehearsal space? 
(For example, if you were asked this question about the room you are in 
now, you would give a score of 7. However if you were asked this 
question about whether you were sitting in a room at home now, you 
would give a score of 1). 

• To what extent were there times during the rehearsal when the rehearsal 
space became the reality for you, and you almost forgot about the real 
world of the lab in which the whole experience was really taking place? 

4. Cooperation 

• To what extent, if at all, did you have a sense of cooperating with your 
acting partner? 

The questionnaire responses are shown in Figure 9.5. These are shown for an 
indication of trends - the sample size is too small to use statistical significance 
tests. As would be expected over time all four indicators increased. The virtual 
rehearsals appeared to become more and more similar to real rehearsals, with a 
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greater sense of presence in the rehearsal space, copresence with the others 
involved, and a greater degree of cooperation. 



Figure 9.5 Questionnaire responses. 


Far more interesting were the qualitative responses obtained from the debriefing 
sessions after each rehearsal. Here are some sample comments from Day 1 and 
further down from Day 4. After the first session, the “alien” nature of the 
experience was uppermost in people’s minds, plus the difficulty of getting used to 
the controls: 

“It affects you on two levels. On the first level, somebody’s smiling at me, 
somebody’s frowning at me, somebody’s shocked or surprised, but then there’s a 
second level that says but did she really mean to press that button? Did she really 
mean to express that? From this particular session...for me...was just getting used 
to the way that it works.” 

“Not having any eye contact with the other actor; being in a room on your own 
with a computer and acting a scene with another person who you don’t see apart 
from on a screen is very strange.” 

“... in a real rehearsal what you’re doing, if you’re working with people, is you 
“eye contact” - there’s something which exists between you. You create that and 
you can do things with that. In this situation that wasn't there at all. Although you 
were sharing something with someone through your headphones you were aware 
of other people but it's quite difficult not being able to see that person as far as the 
acting goes.” 

Yet, among the actors, there was still excitement at the possibilities of virtual 
rehearsals: 
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“It's very interesting. It was difficult getting the coordination going. It could be a 
lot of fiin and the potential of it is very exciting. The way you could go...you know 
where you could go with it. There's no boundaries. I think that's really exciting.” 

On the first day of rehearsals David Chambers, Professor of Acting and Directing 
at Yale School of Drama, watched all the actors rehearse and later wrote: 

“It was remarkable watching the actual actors on separate monitors as they 
rehearsed on-screen under the gentle, succinct control-booth guidance of [the 
director]. Of the four couples that I witnessed, seven of the actors clearly reached 
some level of emotional engagement with their virtual partner at some point; one 
or two were almost consistently emotionally expressive and responsive. When 
present, the emotional content was clearly perceivable, vocally and gesturally. The 
lone exception was a young woman who, by her own admission, was computer 
illiterate and completely befuddled by the keyboard and monitor set-up. Her acting 
partner rushed, on-screen via his avatar, to her aid in the VR kitchen and did his 
best to help her through her dilemma. An off-script, rehearsal-like, emotional 
exchange was occurring.” 

By the fourth day the system itself had become transparent for most of the actors 
and they were concentrating solely on the acting. 

“[Interviewer: Did you have any sense of it being any real acting?] Yeah in a fiinny 
kind of way. Yeah. Getting there. It’s kind of becoming, getting its own reality...! 
mean we've only done it four times and now that horrible beginning is kind of 
finished and you sort of feel I could work out how to use this and think about how 
I'm doing things or how I could do. You could sort of see that, that then becomes 
quite absorbing and you kind of enter into it.” 

In answer to the same question another said: 

“Yeah, it's getting quite good that way because everyone is getting used to working 
the equipment. Because what happens is that you are suddenly...you are completely 
unaware of what is happening around you. Because when you get used to it that 
becomes the norm. That's the rehearsal space; that's how I can move; that's how I 
can do that; that's all the reactions I've got. So that's all I can use and you just 
accept it...And you're using that and you stop, look around and say “there’s a whole 
world around me; people working away in comers”.” 

Nevertheless there was one who believed that the system could never be used for 
rehearsal: 

“It's also getting used to it. I just haven't got used to it....Not just the controls; the 
whole thing of it. The whole experience. And all those questions on the 
questionnaire about: “Does it feel like a real rehearsal?” It doesn't and I don't think 
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it ever could feel like a real rehearsal, but you get used to it and it becomes a 
different way of rehearsing. A rehearsal in its own right.” 

One aspect of the experience that all agreed on was that it could be used for 
“blocking”. Blocking is the process by which the actors and the director map out 
the scenario spatially and decide the positions, movements and orientations in 
which everyone will adopt throughout the scenes. For a filmed production, the 
camera directors and crew are also involved in this process. This process alone can 
take a significant amount of the rehearsal time. A VE scenario has some 
advantages over the traditional rehearsal space for blocking. In particular, the fiill 
space can be modeled and made to look far more like the final scenario than the 
usual chalk marks and tape characteristic of the early stages of a rehearsal space. It 
is important to note therefore that the actors built an internal mental model of the 
rehearsal space (even though the experience was desktop based). For example, 
when asked to map out the rehearsal room during the interview, one actor said: 

“If I imagine it as this size of a room, which it could be...( hand gestures) then, 
obviously, the door’s a bit more central; and then you’ve got the work surfaces 
along the side and you’ve got the toaster and coffee pot, the milk, and you’ve got a 
cupboard up there, the fridge is there, and the window’s there, and there’s the table 
and the newspaper is there. Is that what you mean?” 

We return to the original question posed by the BBC - could actors use such a 
system for remote rehearsal, of course not to replace but to add to the current 
methods of carrying out rehearsal? During the debriefing the actors were asked to 
imagine that, while working in New York, they were asked to carry out rehearsals 
in London during weekends. Would they opt for a level of virtual rehearsal 
instead? 

“Oh yeah because it’s as real as it gets; as the machinery allows. If it’s only a 
rehearsal situation. There’s only so much you can do in a rehearsal, once a 
performance begins you’re still building on it. Once you have an audience, whether 
it’s serious drama or comedy the audience will have an effect on how you play it. 
So there is only so much you can do in rehearsal, so doing it in virtual reality is 
fine. You’re working out the moves and where the furniture and the props are.” 

From observation, it is clear that the actors demonstrated emotional expression 
through the use of their faces, body and spatial disposition. Figure 9.6 shows a 
typical example - the woman is standing with her back to the man, looking out of 
the window as he enters the room, to convey a certain mood that belonged to this 
particular subtext. Another example is shown in Figure 9.7. 
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Figure 9.6 The woman conveys mood by looking out of the window as the man 
enters. 



Figure 9.7 The actors used their virtual bodies to convey emotional states. 


Finally, was there transfer from virtual rehearsal to the real-life rehearsal? The 
answer is almost certainly yes: the directors agreed that what was done in the ten 
minutes of live rehearsal before the audience entered could not possibly have 
produced the performance that was observed. 

This was only an initial experiment with much that could be improved. In 
particular, as it was desktop-based, there was no chance for the actors to “reach out 
and touch” one another by actually using their real bodies to do so. In real life we 
typically do not think about our facial expressions and body language, they just 
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happen. The advantage of a tracked immersive system, such as could be offered by 
a spatially immersive device such as a ReaCTor or HMD, would be that the actors 
could use their full bodies for acting and share the space with the other (remote) 
actors in full 3D. A further round of experiments is in preparation. 


9.7 Responses to Virtual People 

A very likely trend over the next few years is the interaction between real people 
and virtual people. A well-known example is the Ananova virtual newscaster [16], 
and other areas of development are in sales, education and medicine. A 
fundamental question relating to the success or failure of this technology is the 
extent to which people will suspend disbelief and actually engage with virtual 
characters, and related to that is the extent to which virtual characters will trigger 
the appropriate emotional and psychological responses in real people. 

Strong evidence relating to the interaction between real and virtual people comes 
from the area of psychotherapy, and we review some results in this section. Most 
of the work in the application of virtual reality in psychotherapy has been in 
relation to fears and phobias of “things” or situations involving things - spiders, 
bridges, airplanes, heights (for example [17, 18]). A more interesting and 
challenging application is to social phobias - shyness and various types of social 
anxiety, including fear of public speaking. These phobias can cause enormous 
damage in people’s lives, and are difficult and costly to treat because to confront 
these fears other people must be confronted. This is both difficult to organize, and 
is at the very heart of the problem itself Classical exposure therapy gradually 
introduces the client to the feared “other” - in this case the experience of speaking 
in front of an audience. Yet it is clearly complicated and expensive to find real 
audiences who would be willing to perform the role of becoming an audience for a 
client; and there are potentially millions of such clients [19]. A severely phobic 
patient needs to gain confidence by interacting with real people in social situations. 
But the very phobia itself makes this extremely difficult if not impossible. Virtual 
reality offers a “half-way house” where a phobic client can interact socially with 
virtual people, to gain enough understanding and confidence in order to confront 
real situations later. 

The area of therapy, and in particular treatment of social phobia, is also of 
theoretical interest. Virtual reality offers the possibility of “presence” in a virtual 
place and copresence with other people located at remote spaces - and there has 
been a great deal of research in this area (see, for example, [20]). Therapy for 
social phobia cannot work without a strong sense of presence-in-a-place (e.g., the 
seminar room for fear of public speaking therapy) and with other people (the 
virtual audience). Maximizing both types of presence is of great practical utility in 
this type of application. 

Three experimental studies have been carried out at UCL on the reactions of real 
people to virtual audiences. We consider each in turn. In the first study, which was 
a pilot for the research as a whole, 10 speakers were recruited on the UCL campus 
by asking for volunteers who wished to rehearse a short talk of about five minutes 
in a “safe” setting provided by a virtual audience [21]. The scenario was, for 
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example, a job interview, or an examination viva where the person is asked “Please 
give a brief description of your current work”. The subjects were free to choose 
any topic of their choice, and to speak without notes or other prompts. 

The purpose of the study was not to provide therapy, but only to see whether the 
subjects would respond with negative affect to negative audience responses and 
with positive affect to positive responses. There were three types of programmed 
audience response: negative, where the virtual audience members independently 
looked away from the speaker, avoiding “eye contact”, fidgeted, fell asleep, shook 
their heads, had bored facial expressions, and walked out of the room. This was to 
accompanying sounds of occasional yawns and moans. The positive reactions 
consisted of the audience gazing at the speaker, nodding, smiling, spontaneous 
clapping, and generally behaving in an interested way. A mixed response always 
started off with mainly negative responses, and gradually turned to positive 
responses, and ended with a standing ovation. Subjects gave their talk three times: 
first experiencing a negative or positive response, then the other way around, and 
finally always the mixed response. 

Data was collected by questionnaire on a number of variables, including 
assessments of copresence with the virtual audience, self-rating of their 
performance, ratings of the interest of the audience in what they had to say, and 
standard social anxiety and fear of public speaking questionnaires. 

The results show, as would be expected, that self-rated performance generally 
increases with time - that is, speakers tend to rate their performance as improving 
with each successive trial, on the average. However, independently of time, 
speakers’ self-rated performance, measured across a number of factors, is 
positively associated with their rating of the friendliness and mood of the audience 
and interest of the audience in what they had to say. Now this seems a rather trite 
result: clearly, anyone would rate their own performance better when the audience 
seems interested and friendly than when the audience is not interested and hostile. 
But recall - these are computer-generated virtual people, there are no real people 
there. 

This result was not unexpected, since it is in line with other research suggesting 
that people respond to media as other social actors [20]. However, to see this in 
practice the experiment was truly stunning. There is so little that can be captured 
by questionnaires - the whole body language of some of the speakers was quite 
different after experiencing a positive session rather than a negative session. One 
speaker, on his second time around after previously experiencing a negative 
session, apologized to the audience saying “I am sorry I bored you last time, I will 
try to do better this time”. Another, on seeing the audience walk out on him, said, 
“I can see that you’re not interested in my subject”. Overall, it is the responses that 
we could not capture through questionnaires that were the most significant, rather 
than the pale reflection that questionnaires can give of such an experientially rich 
situation. 

In the second study [23, 24] the purpose was to repeat the first study with a much 
larger sample size. 40 subjects were recruited from UCL, 12 assigned to a static 
neutral audience, and 14 each to a positive or negative audience of 8 virtual people. 
The static audience were posed in a neutral expressive state. The negative audience 
again carried out extremely negative behaviors such as never making eye contact 
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with the speakers, lounging on the desks, carrying out other activities such as 
reading or talking to one another, and making comments such as “This is rubbish!” 
in response to the speaker (Figure 9.8). The positive audience (Figure 9.9) was 
over-the-top friendly - frequently nodding and smiling, occasionally bursting into 
spontaneous applause, and making helpful comments such as “Fascinating!”. 
Responses were elicited using standard questionnaires to measure anxiety, and also 
by debriefing interviews. Also, the propensity of the subjects to fear of public 
speaking was assessed before the experiment by use of a standard instrument 
known as the Personal Report of Confidence as a Public Speaker (PRCS). The 
fundamental result is shown in Figure 9.10 which shows standardized pre- and 
post-talk PRCS scores. For those who experienced the positive or static audience, 
the degree of anxiety during the talk can be predicted from the person’s everyday 
degree of public speaking anxiety. However, for those facing the negative 
audience, the anxiety was uniformly high irrespective of the underlying propensity 
to public speaking anxiety. Whether normally confident or phobic public speakers, 
the negative audience resulted in a high level of anxiety. 



Figure 9.8 Snapshot of the negative audience. 

The third experiment used an active but neutral audience, one that did not engage 
in any overt negative or positive behaviors, but which seemed to listen politely 
listen to the speaker, occasionally making eye contact, with a small amount of 
fidgeting typical of most real audiences (Figure 9.11). The hypothesis was that 
normally confident individuals would show no increased anxiety when speaking to 
this audience compared to speaking to a virtual empty room (the same room, but 
without the audience). Phobic individuals, on the other hand, should show a 
marked increase in anxiety when speaking to a virtual audience compared to just 
an empty virtual room. 

40 people were recruited, half of them phobic public speakers and the other half 
confident, pre-screened on the PRCS. They were distributed into eight cells of a 
between-groups factorial design (phobic/confident and with virtual audience/empty 
room). As usual they gave a short talk in front of the virtual audience (or empty 
room) and then answered questionnaires followed by a debriefing interview. In this 
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experiment subjects also had their heart rate monitored, and were assessed as to 
their subjective impression of their own physiological state including their heart 
rate. The results showed clearly that the phobic subjects had a significantly higher 
anxiety level, actual heart rate, and subjectively assessed heart rate when speaking 
to the virtual audience rather than the virtual empty room. For confident speakers 
this difference did not occur except for a small increase in actual heart rate when 
speaking to the virtual audience compared to the empty room [25]. 



Figure 9.9 Snapshot of the positive audience. 



Figure 9.10 Post-talk PRCS by rior PRCS. 


The overwhelming conclusion from these studies is that people do respond to 
virtual audiences. No audience, while still in virtual reality, produces no 
discemable effect (Study 3). A neutrally behaving virtual audience produces a 
heightened anxiety in phobic speakers compared to no heightened anxiety in the 
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case of confident speakers (Study 3). A negative audience produces high anxiety 
for both confident and phobic speakers (Study 2) and a positive audience produces 
anxiety commensurate with the predisposition of the speaker (Study 2). People 
respond independently of the visual sophistication of the avatars - those of Study 1 
were inferior to those of Study 2, and Study 3 had the most sophisticated avatars. It 
is behavior that matters rather than visual appearance. 



Figure 9.11 Third experiment neutral audience. 


9.8 Conclusions 

This chapter has reviewed some experimental work, mostly carried out at UCL in 
collaboration with partners, on the responses of people to virtual environments that 
involve other social actors. The results suggest that people do respond strongly to 
the representations of other people, as people. However, the accord generated 
within a group may not be as strong as it would be if the same people met for real. 
There are very significant impediments to social interaction caused by the lack of 
ability for emotional and gestural behavior and response. There is evidence to 
suggest that immersion enhances the capability for leadership, though this is a 
complex issue and the matter is far from settled. Finally, there is evidence to 
suggest that significant emotional responses can be obtained even when the human 
participants know that there are no real people there, just computer-controlled 
virtual people that have human-like behaviors. In terms of the wide area 
international trials, perhaps the most telling and positive result, overlooked in the 
context of carrying out the experiment, is that the experiments themselves were 
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managed using the shared virtual environments systems (DIVE or dVS). These 
experiments were logistically complex, and yet were managed in real-time by the 
experimenters using the shared VE system. 

The results of the acting experiment were perhaps the most remarkable. Actors 
who were generally not very computer literate were able to set up rapport with one 
another and a director, enough to rehearse a short play for presentation in front of a 
live audience. The system was minimal, crude, and desktop based - and yet the 
results show that the system was effective. This may be partly due to the special 
skills of actors, who often work under difficult circumstances, and are able to find 
a way through problems of self-presentation and coping with others. Nevertheless 
if such an application, perhaps the most demanding for shared VEs, can work at 
all, then we can be optimistic about the exploitation of this technology as it 
improves in the future. 

The Daleks are a nasty race, interested only in commanding, conquering and 
destroying. It is possible that they behave in this way because of their (in 
comparison to humans) inability to carry out the normal non-verbal behaviors 
associated with social interaction. It is likely that in years to come a fair proportion 
of the world’s business and social activity will be carried out in extensive shared 
VEs. Therefore we have to make sure that we do not evolve a breed of humans 
who behave like Daleks because of their similarly warped and limited behavioral 
capabilities and potential for action in the virtual world. Much research of this kind 
is essential in order to understand what happens when people meet other virtually 
real people in virtual places. 
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Chapter 10 

Collaboration in Multi-modal Virtual 
Worlds: Comparing Touch, Text, 
Voice and Video 

Eva-Lotta Salinas 


10.1 Introduction 

Social aspects of virtual reality is an area of research that has expanded as the 
technology has matured. Collaborative virtual environments (CVEs) show great 
promise for investigating how human-human interaction works. The reason for this 
is that the mode of communication as well as task contexts, spatial affordances, 
information presentation and manipulation of common objects can be varied in 
order to understand the effects of - and interrelations between - these factors. The 
communication mode is often text-chat in virtual environments, and audio or video 
channels are used less often. It has only recently become possible to support other 
human senses like touch in three-dimensional virtual environments. In this chapter 
my main interest is in comparing the different communication modes, such as text- 
chat, voice communication and video conferencing, and investigating the effect of 
supporting the touch modality. Evaluation of collaboration through different 
communication modes is not as common in the area of CVEs as in the area of 
telecommunications and computer-mediated communication (CMC). One reason 
for this is that social psychology has not had a large impact on research in the field 
of CVEs. In social psychological studies of mediated interaction, the focus of 
interest is, for example, on how people can build and sustain relations, prevent and 
solve conflicts, and collaborate to attain joint goals [1]. A general argument that a 
number of theories make is that the social richness of the communication medium 
has to be matched with the task in order for collaborators to accomplish these 
different goals [2, 3, 4, 5]. The capacity that a medium has to transmit social 
information, like tone of voice or facial expressions, affects people’s notion of 
social presence. Social presence is defined as the feeling of being present with 
another person at a remote location. This kind of perspective on mediated 
interaction from research in the area of telecommunications and CMC can be used 
as a starting point for studying human-human interaction in virtual reality. 
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The three-dimensionality of virtual environments adds a number of specific 
features to traditional communication environments. The virtual environment is 
often perceived as a place in which people can navigate with an avatar, interact 
with objects and obtain information. In order to evaluate the affordances of a 
virtual environment the concept of virtual presence has been developed. Virtual 
presence is defined as feeling as if being present in a computer-generated 
environment that feels like reality [6]. In this chapter I will argue that it is relevant 
to compare social presence to virtual presence in hybrid collaborative 
environments where several modes of interaction are provided. The reason is that 
both perceived social and virtual presence have been said to predict improved 
performance. It is therefore important to investigate interrelations between these 
two dimensions in relation to performance, and to measure collaboration 
objectively. 

Studies on mediated collaboration have focused on people's subjective 
perceptions of different communication modes and also people's actual behavior 
when interacting in different modes [7]. Communicating through audio is 
important when collaborating at a distance and improves both task performance 
and perceived affordances in comparison to text-chat [8, 2, 9, 10]. In one study 
[11] various modes of interaction were examined ranging over audio, handwriting, 
typewriting, video and face-to-face. Results showed that people spoke more than 
they wrote, and people performed best in terms of time to complete tasks when 
audio was provided compared to text. 

Research has not found such uniform results regarding benefits from using video 
conferencing. It is usually shown that video does not add significant advantages to 
task performance compared to audio [12, 13, 14]. In Chapanis' study [11] it was 
found that, in terms of time to complete tasks and number of words spoken, there 
was no advantage associated with visual access. The advantages of video that have 
been found are subjective ratings of qualities of communication modes. One study 
showed that preference ratings for people who met in an audio mode were lower 
than for those who met in a video mode or met face-to-face [15]. Daly-Jones et al. 
[16] found in their study that users were more aware of the presence of their 
partner, could monitor their partners’ attentional status better, and felt that the 
mode of communication aided collaboration more when video was provided. 
Video has also been shown to support informal communication and relation 
building [17, 18, 19]. 

It has been shown in a number of studies of CVEs, that supporting the touch 
modality through haptic force feedback improves task performance [20, 21, 22]. 
Haptic sensing is defined as identifying objects through both motor behaviors and 
stimulation of skin receptors [23, 24]. One study shows that if two people have the 
opportunity to “feel” the interface in which they are collaborating, they manipulate 
the interface faster and more precisely [25]. In another study, subjects were asked 
to play a collaborative game in virtual environments with one of the experimenters 
who was an “expert” player. The players could feel objects in the common 
environment. They were asked to move a ring on a wire in collaboration with each 
other such that contact between the wire and the ring was minimized or avoided. 
Results from this study show that haptic communication could enhance perceived 
“togetherness” and improve task performance in pairs working together [26, 27]. 
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In this chapter two experimental studies are presented that were aimed at gaining 
a better understanding of the social dynamics and experiences of users 
collaborating in virtual environments using different communicative modes. The 
interaction is often informal in virtual settings and not designed for performing a 
specific task. In the research reported here, the interest focuses on pairs of people 
performing specific tasks collaboratively in an experimental setting. The research 
questions investigated include how people perform in accomplishing different 
tasks when communicating via different media, and to what extent people perceive 
that they are socially and physically present in a mediated environment when 
communicating via different media. I will also present theories that make 
statements about the extent to which humans perceive that communication media 
differ in relation to social interaction or to their capacity to transmit feelings of 
physical presence at remote locations. Furthermore, I will discuss to what extent 
these differences affect communication and collaboration in three-dimensional 
virtual environments. Previous studies in the area of telecommunications and CMC 
have shown that the mode of communication has a major impact on mediated 
interaction. In this chapter it will be shown that the mode of communication has 
just as dramatic effects on collaboration in three-dimensional virtual environments. 


10.2 The Social Quality of Mediated Interaction 

In the area of CMC, the interest in the capacity of media to transmit social 
information focuses on the interaction between people through different 
communications media like email, telephone or video conference and also face-to- 
face interaction. Social presence theory [2] evolved through research about 
efficiency and satisfaction in the use of different telecommunications media. Social 
presence refers to the feeling of being socially present with another person at a 
remote location. It has been argued that social presence varies between different 
media and affects the nature of the interaction. Moreover, in relation to virtual 
reality, it has increasingly become recognized that it is important to investigate 
specifically the social dimension of perceived presence. In virtual reality contexts, 
the concepts of togetherness, copresence and social presence are used in order to 
address issues of social interaction [28, 27]. 

Heeter [28] divides the concept presence into three dimensions; personal 
presence, social presence and environmental presence. Personal presence, 
according to Heeter, is a measure of the extent to which, and the reasons why, 
persons feel as if they are in a virtual world. Social presence refers to the extent to 
which other beings, both living and synthetic, exist in the virtual world and appear 
to react to you. Environmental presence refers to the extent to which the 
environment itself appears to know that you are there and reacts to you. 

The notion of togetherness has been investigated as a sense of people being 
together in a shared virtual space [27]. This notion of togetherness is argued to be a 
counterpart to the notion of one person feeling as if they are present in a virtual 
environment. People perceive togetherness in a common environment if one 
person’s alterations in the environment are clearly perceived by the other person or 
if changes are a result of collaboration. Modes of interaction are argued to be 
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important for enhancing the sense of togetherness - with the tactile channel 
sometimes given as an example. 

Short et al. [2] regard social presence as a single dimension that represents a 
cognitive synthesis of several factors that are naturally occurring in face-to-face 
communication. Among these are the capacity to transmit information about tone 
of voice, gestures, facial expression, direction of gaze, posture, touch and non¬ 
verbal cues as they are perceived by the individual to enhance presence in the 
medium. A large number of experiments were made to test people’s perceived 
social presence communicating through the different modes available at that time. 
The order of media that is suggested by social presence theory research (when 
Short et al. conducted their research in the 1970s) was, from higher to lower: face- 
to-face, television (video), multi-speaker audio, telephone (also speaker phone and 
monaural audio), and finally business letters. Short et al. [2] did not regard the 
order of media presented as conclusive, as it can depend on conditions of testing 
and context. This kind of ordering of media is also problematic because new 
combinations of media and support for more human modalities can overcome 
previous ones. In another study, it was found - contrary to expectations - that text 
messages were preferred over voice messages in reducing uncertainty and 
resolving equivocal situations [29]. 

The level of presence is the extent to which a medium is perceived as sociable, 
warm, sensitive, personal or intimate when it is used to interact with other people 
[2]. The degrees to which people perceive a medium as supporting social presence 
relates to the purpose of the interaction, and will influence the medium chosen by 
the individual who wishes to communicate. Hence social presence theory proposes, 
among other things, that people are more or less aware of the degree of social 
capacity of a medium, and that they choose to use a medium that is perceived to be 
appropriate for a given task or purpose [2, 3,4, 5]. 


10.3 Presence Defined as Being There 

In contrast to the notion of social presence, there is also the notion of presence, or 
of feeling as if being more or less physically inside a computer-generated 
environment that feels like reality. To achieve this is one of the aims of virtual 
reality development, especially in developing immersive virtual environments. The 
notion of presence is linked here to a state of consciousness, the psychological state 
of being there in a virtual environment [30, 31]. In some research the aim is for a 
person to feel physically present in the media environment - to the extent that the 
person should not be able to distinguish between the real and the mediated world 
[32]. Thus a person is subjectively present in an environment if the person 
perceives that he or she is physically present in that environment. Witmer and 
Singer [6] define presence as the subjective experience of being in a place or 
environment, even when one is physically situated in another. This notion of being 
present in locations that are not physical has also been called virtual presence [33]. 

The information in the media environment must be meaningful to maintain the 
individual's focus and sense of presence. Further, distracting events in the physical 
locale must be limited, or it must be possible for the individual to integrate them in 
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the virtual environment in a meaningful way. The concept of involvement thus 
becomes important for the degree of perceived presence. Involvement depends on 
the degree of significance or meaning that individuals attach to stimuli, activities, 
or events [6]. As users focus more attention on the virtual reality stimuli, they 
become more involved in the virtual reality experience, which leads to an increased 
sense of presence. 

The degree of immersion also affects perceived presence. Immersion is a 
psychological state characterized by perceiving oneself to be enveloped by, 
included in, and interacting with an environment that provides a continuous stream 
of stimuli and experiences [6]. A virtual world that produces a greater sense of 
immersion will produce higher levels of presence. Factors that affect immersion 
include isolation from the physical environment, perception of self-inclusion in the 
virtual environment, natural modes of interaction, and control and perception of 
self-movement. 


10.4 Aspects of Usage in Multi-modal Virtual 
Worlds 

In virtual environments it is possible for people to orient themselves in relation to 
each other in the common environment. This means that a person’s intentions can 
be understood by the spatial orientation of a person in the environment. Unlike in 
video conferences, in a CVE, a comfortable distance can be chosen between the 
avatars in a dynamic way during the interaction. Subgroups can be formed which 
correspond to a face-to-face situation to a larger extent. It is also possible to 
understand that a person focuses on certain objects and information points just by 
body position, or by manipulation of an object in virtual environments [34]. In this 
way communication about content is easier because not every item has to be 
explicitly introduced in the dialog. People can also handle objects with their 
avatars in the common environment. In this way virtual environments generate 
means by which people can not only communicate directly, but also use deictic 
reference by talking about objects that are visible to all communicators, and finally 
communicate through an artefact by manipulating it [35]. 

It is also important to consider the usability issues of virtual interfaces in order to 
optimize people’s efficiency in performing tasks in virtual environments. The most 
common measures of efficiency in human computer interaction are the time it takes 
users to perform tasks and the precision with which these are performed. It is also 
possible, for example, to analyze the dialog between collaborators and measure the 
frequency of words used, amount of pauses, or overlapping speech. Other issues to 
address include people’s capability to obtain and understand information in virtual 
environments - information that can be visual, auditory or tactual. There is also 
people’s ability to understand spatial representations and their navigational 
efficiency [36]. Factors that affect efficiency in task performance are task 
characteristics, user characteristics, and capabilities and limitations of the human 
sensory and motor system. 
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Enhancing interactivity by modes of communication in virtual environments is 
generally said to improve task performance [37] and increase social presence [2]. It 
has been argued that the level of virtual presence is a function of the number and 
fidelity of the sensory inputs to the user [38, 39, 40]. However, there have also 
been arguments that interactivity brings with it an increased workload that could 
impede task performance [36]. 

Some research has shown that there are individual differences regarding people's 
tendency to perceive virtual presence in media. These differences depend on 
factors like how extrovert or introvert subjects are, or their tendency to get 
involved in narratives. There may also be differences in experience and knowledge 
of media, cognitive style, and level of sensation seeking [41] which can affect 
individuals’ tendency to perceive virtual presence. For navigation in virtual spaces, 
spatial memory has been found to be an important factor for predicting how well a 
naive user will succeed in solving various tasks when interacting with a visual 
interface for the first time [42]. Generally task performance in virtual reality can be 
said to be improved if it is more efficient than in alternative systems - or if 
objective measures of task performance can be enhanced within the same type of 
system. 


10.5 Effects of Interaction Modes on Collaboration 


10.5.1 Communication Modes in Virtual Worlds 

In most virtual worlds the means by which people communicate is text-chat. An 
experimental study was conducted in order to investigate to what extent 
collaboration in a desktop CVE is affected by an audio communication channel or 
video connection in comparison to text-chat. 

A between-group design was used and sixty subjects participated in the 
experiment. They were assigned to pairs with one woman and one man in each 
pair. Subjects collaborated in pairs and performed a decision-making task together. 
The subjects did not know each other prior to the experiment. There was one 
independent variable: the CVE with three conditions - video, voice and text-chat. 
Dependent subjective variables in the experiment were perceived social presence, 
perceived virtual presence and perceived task performance. These were measured 
by questionnaires. Furthermore, the dependent objective variables pertaining to the 
interacion were time taken to finish the task, frequency of words used in 
communication, and number of words per second. 

Two PowerBook PCs networked via Ethernet were used in the experiment. In the 
voice condition, two telephones with headsets were used. In the condition with a 
video connection, two 21-inch television monitors were used plus two telephones 
with headsets providing the voice communication. In the text condition the subjects 
only communicated via the text-chat that is a feature in Active Worlds. The CVE 
was constructed in Active Worlds (www.activeworlds.com) and the environment 
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consisted of an exhibition with information points (posters and QuickTime movie 
clips with audio) (see Figure 10.1). Human-like avatars represented the subjects. 
The interaction between subjects was video recorded via a video splitter in such a 
way that the dialog and the video image of both subjects’ faces, shoulders and their 
navigation in Active Worlds were visible and audible synchronously on the same 
screen for later analysis. Recordings were made in this way in all three conditions. 

The decision-making task was presented to the pairs of subjects as a written 
scenario: “You have participated in a competition in which you and another 
participant performed equally well. You will therefore share the first prize: a Volvo 
car of your choice with insurance and gas for one year. You will use the car every 
second month. You will not be able to sell the car. The organizers of the 
competition now want you to go in to a virtual exhibition and choose the car that 
you are going to share. You should then decide together which of you is going to 
have the car during the first month.” 

The dialogs were transcribed from the video recordings. The data from 
questionnaires and observations were analyzed using one way. ANOVA (Analysis 
Of VAriance). Questionnaires from sixty subjects were analyzed, while we only 
made observations of twenty-four pairs of subjects from which objective measures 
were obtained. 

The results showed that task performance, defined operationally as total task 
completion time, differed significantly between the text-chat and the video 
condition and between the text-chat and voice condition, but not between the video 
and the voice condition. It took the longest time, approximately 29 minutes, to 
perform the tasks in the text-chat condition, and the shortest time, approximately 
10 minutes, in the voice condition. The video condition took approximately 16 
minutes. That text-chat is a medium that is much harder to communicate in, and 
that collaboration is slower in this condition might not be a surprising result. 
However most communication in virtual reality is still text-chat - even though it is 
obviously awkward to both communicate and navigate with one's hands. These 
results show that collaboration that involves joint decision making benefits 
significantly from a voice communication channel. Collaboration was a bit faster in 
the voice than in the video condition, but this difference was not significant. 



Figure 10.1 The exhibition with information points in the Active Worlds 
environment. 
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As regards the frequency of words used, the results show a significant difference 
between the text and the video condition, and between the text and the voice 
condition - but not between the video and the voice condition. The frequency of 
words used was highest in the video condition (M=1243) and lowest in the text- 
chat condition (M=342), with the voice condition (M=1084) in between. Not only 
is the communication in the text-chat condition slower, but the dialog also contains 
significantly fewer words than in the video and voice conditions. When analyzing 
the dialog, we see that it is more task-oriented in text-chat, and more socio- 
emotional in the voice and video conditions [43]. Even if the result was not 
significant, fewer words were used in the voice than in the video condition. 

When we analyzed words spoken per second, we found significant differences 
between voice and text-chat, video and text, and as between the voice and video 
conditions. There were most words per second in the voice condition (M=1.9), 
fewer words per second in the video condition (M=1.3) and by far the fewest 
words per second in the text-chat condition (M=0.2). In their study, Chapanis et al. 
[44] show that communication in a voice medium is wordier than in a text medium, 
which means that far more words are spoken per minute. The fact that the largest 
number of words were spoken in the video condition and that people spent more 
time on this than in the voice condition indicates that the voice condition is the 
most efficient - but that the video condition might be a more enjoyable medium. 
Results obtained by Davies as referenced by Short et al. [2] showed that telephone 
communication is more efficient than face-to-face communication - which is 
lengthier and wordier. 

Another hypothesis that was investigated in this study was whether the 
conditions differed in relation to subjects’ perceived task performance. This 
dimension was measured by a questionnaire and the items were added together. 
The ratings of perceived task performance did not differ significantly between the 
video and the voice condition or between the voice and text-chat condition. There 
was, however, a significant difference between the video and the text-chat 
condition. The mean rating for the overall perceived performance on a seven point 
Likert-type scale was highest for the video condition (5.5), somewhat lower for the 
voice condition (5.0) and lowest for the text-chat condition (4.5). Thus 
observations show that it is harder to collaborate in text-chat than in the video 
condition. 

The hypothesis that the three conditions - video, voice and text-chat - would 
differ regarding subjects’ perceived social presence was partly verified. The 
dimension of social presence that we measured with a questionnaire differed 
significantly between the text and video condition and between the text and voice 
condition - but not between the video and the voice condition. Items in the 
questionnaire were analyzed together as a total dimension. The mean value for 
each question on a seven point Likert-type scale was 5.2 in the video condition, 5.1 
in the voice condition and 4.3 in the text-chat condition. People perceived 
themselves to be more socially present in the video and voice conditions than in the 
text-chat condition in the Active Worlds study. That people perceive themselves to 
be less socially present when communicating via a text medium is supported by 
Short et al. [2]. However, the result obtained in this study, showing that there is no 
difference between the video and voice medium in relation to social presence, goes 
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against the same study by Short et al. [2]. A tentative explanation might be that the 
spatiality of the virtual environment compensates for the video channel. But this 
statement needs to be empirically verified in future studies. 

One hypothesis was that there would be a difference between media conditions 
regarding perceived virtual presence. The total dimension of perceived virtual 
presence measured by the questionnaire differed significantly between the text and 
video conditions and between the text and voice conditions - but not between the 
video and voice conditions. The mean value for each question on the seven point 
Likert-type scale was 4.4 in the text-chat condition, 5.0 in the video and 5.0 in the 
voice condition. People perceived themselves to be more virtually present in the 
video and voice condition than in the text-chat condition in Active Worlds. The 
idea that virtual presence is a function of the number and fidelity of the sensory 
inputs to the user [38] is therefore only partly supported by the results of our study. 
Voice communication clearly increases the perceived virtual presence in 
comparison to text-chat - whereas video communication does not add any value in 
this situation. 


10.5.2 Supporting Touch in Coiiaborative Environments 

Haptic feedback is a natural ingredient in face-to-face interaction between people, 
and it also serves important functions for communication. One example of haptic 
communication is when a person hands over a precious artifact; people then rely to 
a large extent on haptic perception for recognizing that the object has been 
successfully received. Handshakes and a pat on the back are powerful 
communicative events that are symbolic and convey information about relations, 
status and emotional states. It is also intuitive for people to combine gestures, 
deictic references and joint manipulation in collaborative environments. 

An experimental study was performed in order to test the hypotheses that a 
distributed CVE supporting the touch modality will increase perceived virtual 
presence and social presence, improve task performance and increase perceived 
task performance [45,46]. 

A between-group design was used and the independent variable in this 
experiment was the interface condition with two treatments, CVE-voice-haptic and 
CVE-voice-only. The subjects, in different locations, performed five collaborative 
tasks in both conditions. The haptic devices used in the tests were two 1.0 
PHANToM (Figure 10.2). 

In the first condition that included haptic force feedback, the subjects obtained 
haptic force feedback from the dynamic objects, the static walls and the other 
person in the CVE. The subjects could simultaneously manipulate the dynamic 
objects that were modeled to simulate real cubes with form, mass, damping and 
surface friction. The subjects could also hold on to each other by pushing a small 
button on the part of the haptic device that with which they also manipulated the 
virtual cubes. In the second condition the subjects had no haptic force feedback and 
could not hold on to each other. The haptic device then functioned as a 3D mouse. 
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Voice communication in both conditions was provided through a telephone 
connection using headsets. 



Figure 10.2 Subjects are doing the tasks using two versions of the PHANToM, on 
the left a “T” model and on the right an “A” model. 


Task performance was measured by the total time it took the pairs of subjects to 
perform the five tasks, and also by the frequency of failure to lift cubes together - 
which was used as a measure of precision. These measures were obtained through 
analysis of video recordings of the experimental sessions. The subjects' perceived 
task performance, perceived virtual presence and perceived social presence were 
measured by questionnaires. 

The results showed that haptic force feedback significantly increased task 
performance, which means that the tasks were completed in less time in the haptic 
force feedback condition. Subjects used an average of 24 minutes to perform five 
tasks in the haptic force feedback condition as against 35 minutes in the condition 
with no haptic force feedback. An analysis of frequencies of failures to lift cubes 
together as a measure of precision in task performance shows that it is significantly 
more difficult to coordinate actions with the aim of lifting objects in a three- 
dimensional desktop virtual environment without haptic force feedback. Results 
show that there is a significant difference between conditions regarding subjects' 
ability to lift cubes when building one cube out of eight cubes and when 
constructing two piles with eight cubes. In the haptic force feedback condition, 
subjects failed to lift cubes on average 4 times when building a cube and 7 times 
when constructing two piles. In the condition without haptic force feedback, 
subjects failed to lift cubes on average 12 times when building a cube and 30 times 
when constructing two piles. Thus a major part of the difference in the time it took 
to perform tasks can be explained by the fact that subjects’ precision when lifting 
cubes without haptic force feedback was lower. 

The questionnaire that measured perceived performance showed that the subjects 
in the haptic force feedback condition perceived themselves to be performing the 
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tasks significantly better. The mean value for each question on a seven point 
Likert-type scale shows that subjects perceived their task performance to be higher 
in the three-dimensional visual/voice/haptic condition (5.9) than in the three- 
dimensional visual/voice only condition (5.1). Supporting haptic force feedback in 
a distributed collaborative environment makes manipulation of common objects 
both faster and more precise. There are clear connections between the ease with 
which people manipulate objects together and the time it takes to complete the 
tasks. The results also show that haptic force feedback in a collaborative 
environment makes task performance more efficient. Earlier results in studies 
investigating one person interacting with a system providing haptic force feedback 
have shown similar results [20, 21, 22]. A further study has shown that two people 
manipulate an interface faster and more precisely if they can feel the interface [25]. 

The analysis of data from the virtual presence questionnaire shows that 
conditions differ significantly. The subjects’ mean rating of perceived virtual 
presence was higher on a seven point Likert-type scale in the three-dimensional 
visual/voice/haptic condition (5.4) than in the three-dimensional visual/voice only 
condition (4.4). Haptic force feedback thus adds significantly to people’s perceived 
virtual presence even in an environment that supports voice communication. An 
example of this is the observation that the emotional expressions of failure were 
much fewer in the non-haptic environment when people did not manage to lift the 
cubes. People seemed to be more disappointed when failing to lift the cubes in the 
haptic environment. 

The dimension of social presence measured by a questionnaire did not differ 
significantly across the two conditions when the items were analyzed together as a 
total dimension. The mean rating on a seven point Likert-type scale of the total 
dimension for social presence was highest for the three-dimensional 
visual/voice/haptic condition (5.3) and lowest for the three-dimensional 
visual/voice only condition (4.8). This means that people only perceived 
themselves to be marginally more socially present in the haptic environment. The 
results in the study by Slater and Wilbur [30] indicate that haptic force feedback 
increased perceived togetherness between people in a collaborative environment. 
However, voice as a communication medium was not provided in that study, which 
suggests that voice possibly overshadows the effect of haptic force feedback to a 
certain extent. 


10.6 Conclusions 

Human-human interaction is attained through a complex set of behaviors that 
originate in face-to-face communication. This can be substituted to a greater or 
lesser extent by features in communication systems. Mediated interaction is always 
a compromise and problems arise when the functionality is too limited for a 
particular type of interaction, or when the system fails to substitute the 
functionality that it is intended to support. One example of this is the parallax 
problem in video communication when a video conference fails to support people 
in establishing eye contact. It may be futuristic to imagine that people would like to 
interact with each other represented by avatars. However, in relation to avatars, the 
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aim of the representation may not be to look as realistic as possible to your 
collaborator but to support body positioning, deictic reference and manipulation of 
common objects. 

Supporting human communication modes is important for human-human 
interaction in virtual environments - as it is in all collaborative environments. 
Text-chat is problematic as a synchronous communications medium. When 
communicating through text-chat, people do not perceive themselves to be 
virtually present to a large extent. The dimensions in our questionnaire imply that 
they do not perceive the environment to be as interactive, that they do not have as 
much control in the CVE, and that the environment is not as engaging or 
meaningful as compared to a video or voice conference. Text-chat also makes the 
interaction in the environment less social. Thus the dimensions in the social 
presence questionnaire imply that people do not feel that they can interpret the 
other person’s emotional state equally well. They have a hard time conveying their 
feelings and emphasis and do not build up interpersonal relations as well compared 
with video and voice conferences. Making joint decisions in the text-chat condition 
is time consuming and crude. Dialogs are much scarcer in text-chat and discussions 
preceding decisions are not as extensive. 

Communicating via video or voice is perceived as being equally good in relation 
to people’s perceived virtual and social presence. But there is a significant 
difference between video and voice in terms of the frequency of words spoken per 
second. Collaborating in a CVE which supports voice communication takes the 
least amount of time and the most words per second - but people spend more time 
in the video condition and have the lengthiest dialogs. This would suggest that the 
video condition is in fact different from the voice condition (even if the present 
experiment did not distinguish this by means of subjective ratings). The video 
condition might be a more enjoyable medium - whereas the audio condition is the 
most efficient for this task. 

Earlier studies of collaborative environments that were not related to CVEs have 
found similar results: that providing audio communication makes an important 
difference compared to text-chat, whereas the effect of adding video is not as 
evident [8, 9, 12, 13, 14]. However, the interrelations that are reported here 
between text-chat, audio and video communication extend these results to CVEs. 

Haptic force feedback seems to be important for collaboration in a CVE - at least 
when the task is manipulating objects together. Perceived virtual presence and 
perceived performance are increased - and efficiency and precision are improved - 
when touch is supported. The fact that perceived social presence was only 
improved to a small extent might be explained by the fact that voice is by far the 
most important medium for conveying social information. 

Not many studies have investigated how haptic force feedback affects 
collaboration in a CVE. However, results from single user studies also show that 
performance is improved by providing haptic force feedback [20, 21, 22]. The 
study presented here indicates that there seems to be an interesting cumulative 
effect from audio communication and haptic force feedback on perceived social 
presence that needs to be further investigated [26,45]. 

All pairs managed to complete tasks in both the Active Worlds study and the 
haptic force feedback study. People rated their own performance to be significantly 
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higher in the video mode compared with the text-chat mode in the Active Worlds 
study. People also rated the haptic CVE to be significantly easier to use than the 
non-haptic CVE. The result that people perceived themselves to perform better in a 
haptic environment is very strong - compared to the differences in the 
ActiveWorlds study. Nevertheless, the tasks were different in the two studies and it 
is hard to evaluate performance in joint decision-making compared with 
performance in the manipulation of objects. 

It can be argued that haptic force feedback is more important than video for 
virtual presence in a CVE with voice communication - since video did not add any 
value in this regard whereas haptic feedback did. It should be noted, however, that 
the two studies involved very different tasks and somewhat different questions in 
the questionnaires. The Active Worlds scenario that involved joint decision making 
can be hypothesized to depend less on haptic feedback than the task of 
manipulating cubes that might not have benefited much by video communication. 

It is a challenge to understand all aspects of how support for different human 
modes of interaction affects collaboration in a CVE. However, studies which 
systematically compare tasks in different CVE conditions can make a start in this 
direction. 
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11.1 New Technologies: Hype and Hope 

With the introduction of the Internet, many people saw the beginning of a new era; 
an era of democratization, of status equalization, and of freedom of speech for 
everyone. The accessibility of other places and people via the medium of 
computers, as well as the absence of many social and status cues, would make it 
possible for people to reach out to other people across the national and social 
borders that govern our face-to-face interaction. We would turn on our computers - 
and instantly be connected not only with our offices, our colleagues, and our 
families and friends, but also with people all over the world with whom we would 
make exciting new acquaintances. 

This hype about the Internet as a world-embracing technology that will remove 
many barriers (national, cultural, etc.) and equalize status differences between 
people has a large resemblance to the widespread ideas about the telephone when it 
began to be widely used about a hundred years ago. At that time, the limitlessness 
was seen both as a positive and negative feature of the new technology. So, for 
example, it would bridge cultures, but it would also be like an open door and 
would make it possible for dishonest people from the lower class to make phone 
calls into honorable American homes. Claude Fischer, who has written a history of 
American telephony [1], has argued not only against some of these, nowadays 
merely amusing, fears and hopes, but also against the belief that telephone 
technology has significantly changed American society. The reason why it did not, 
Fischer argues, is because people did not change their communication behavior 
with the advent of the telephone and start to communicate with strangers far away, 
but instead used the telephone to maintain already existing, mostly local ties. 

The same kind of hype, as already mentioned, has also accompanied the early 
days of the Internet. This has partly been negative, a warning against opening the 
internet door to anyone, entering in any disguise; but partly it has also been 
positive, suggesting that the anonymity and accessibility of the Internet will change 


R. Schroeder (ed.), The Social Life of Avatars 
© Springer-Verlag London Limited 2002 



The Digital Divide 


189 


how we interact with other people, and that the connections with other people via 
computer-mediated communication (CMC) will be more informal, less 
conventional and less hierarchical than face-to-face interaction (e.g., [2, 3, 4]). One 
of those who has introduced this idea into the area of virtual environments (VEs) is 
the psychologist Sherry Turkle, who argues, in relation to a study of text-based 
online communities (a multi-user dimension or MUD), that online environments, 
because of their anonymous nature, allow people to take on identities other than 
their “real” ones, and that they go beyond the borders that govern our offline 
behaviour in their online interactions [5]. This freedom from identity, Turkle 
argues, allows people to be whoever they like online, and to interact more freely 
with other people beyond the constraints of the conventional social structure that 
silences women, discriminates against the unattractive, and excludes the introverts. 
However, if we look more closely (and use other, more quantitative and empirical 
methods) at the actual interaction in text-based as well as graphical online systems, 
the ‘freedom from identity theory’ seems to disappear and other, more 
conventional interaction patterns, emerge. Schiano [6], as well as Schroeder and 
Axelsson [7], has shown that users of different online systems do not change 
identity as often, or take on as many different identities simultaneously, as Turkle 
claims. Rather, the more time a user spends in a VE, the more stable is the user’s 
identity. Also, as these two studies [6, 7] as well as Becker and Mark [8] have 
pointed out, online behavior is not so unconventional and unconstrained as is often 
thought, but rather similar to offline interaction. Schiano [6] found, for example, 
when studying a text-based MUD, that people spend most of their time in the VE 
meeting people they already know and inviting them to their virtual “homes” to 
chat - not exactly an unconventional form of behavior! 

However, I do not want to equate the idea of unconventional behavior with status 
equalization, or claim that the first is a prerequisite of the second. Nevertheless, I 
would argue that an environment which promotes alternative social behavior is 
something that could also support more equal interaction between people. 
Inequality, after all, is mostly a question of obedience towards socially constructed 
rules, and not a law of nature. 


11.2 Status Differences 

In recent years, together with colleagues, I have carried out a number of studies 
concerning social interaction in multi-user VEs. Our aim has been to arrive at a 
wide-ranging understanding of how social interaction in VEs relates to other kinds 
of interaction - both unmediated (face-to-face) interaction and other kinds of 
mediated interaction. To do this, we have carried out studies using different 
methods, different theories, but also studied different VR systems: immersive 
CAVE-type VR systems, and different kinds of non-immersive desktop systems. It 
is of course possible to argue against this all-embracing approach, and maintain 
that findings concerning immersive and non-immersive VR systems cannot be 
compared since the systems provide such different conditions for interaction. 
However, on a general level, the systems have much in common: they both provide 
the users with a virtual space within which they can interact and virtual objects and 
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spaces to interact with. Further, as Taylor points out in chapter 3, they provide 
virtual bodies which make it possible to pursue these actions. In other words, VR 
systems can be seen as electronic media, not unlike other media (e.g., telephone, 
email), that connect distributed people and enable interaction between them. Here I 
will only present a few examples of these interactions from a number of detailed 
studies which are reported in longer studies. 

Before introducing the next section, where I will present the findings that relate 
to status differences in VEs, I would first like to consider definitions of “status”. 
One useful definition in this broad context, where the studied groups are of 
different size as well as in different situations (i.e., the group members are 
gathering on a different basis, spontaneously or participating in arranged 
meetings), is to think of high status, on the one hand, as implying a tendency to 
initiate ideas and activities that are taken up by the other members of the group [9, 
10] and, on the other, as involving a positive evaluation or ranking by the other 
group members [11]. However, instead of regarding these two definitions as 
mutually exclusive, I agree with Brown [12] that they can be seen as two 
interrelated perspectives on the status evaluation process, since an observable 
influential behavior is also often highly ranked when it comes to status. One could 
also see these two aspects in relation to two methods of evaluation, objective and 
subjective, such that one can easily measure objectively how many ideas or 
conversation topics each member of a group brings up, or whose ideas and topics 
the group as a whole adopts. 

When analyzing the data collected in our different studies, we have used both the 
methods mentioned in order to pin down the actual relationship between the group 
members. The evaluation that follows is subjective, however, in the sense that it is 
a result of individuals involved in an activity assessing their own and other 
people’s status. When it comes to the objective measures, we have not quantified 
the social behavior in the sense mentioned above (i.e., measuring how many ideas 
of a group member are taken up, etc.), but nevertheless made carefiil and 
systematic analyses of text communication logs and audio recordings. I would 
therefore argue that this method should be regarded as objective since it is not a 
question of self-reports or evaluations by individuals personally engaged in an 
activity, but of interpretations by researchers who are located outside it. 

Before presenting my findings relating to status differences, I would also like to 
clarify some features of the data. As mentioned, I refer to studies that are carried 
out using different VR systems: immersive and non-immersive systems. In the 
non-immersive internet-based graphical systems, the participants communicated 
via text, while in the immersive CAVE environment the participants used voice 
communication. The text communication was logged, but when I use citations, I 
have anonymized both the user names and the names of places. Also, since online 
text conversations have a tendency to stretch out over a considerable space, as a 
result of the many conversations going on simultaneously, I have simply extracted 
the lines that are of no interest to this essay, and show only the relevant lines. The 
voice communication from the trials involved immersive VR systems and was 
recorded, and I have transcribed, anonymized and translated them from Swedish 
into English. 
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11.2.1 Technology Makes a Difference 

In the area of networked multi-user VR, where people are virtually copresent in the 
same environment but physically distributed, people can use various types of 
technical systems (or, to be more precise, several cooperating technical 
components) to input position, movements and communication into the shared 
space. For example, when using a non-immersive VR system, the user generally 
uses a mouse and/or keyboard, that is, only the hands are used for navigation in 
and manipulation of the environment, whereas in an immersive system the user’s 
body movements are tracked via sensors on head, hand and sometimes other body 
parts. Navigation in - and manipulation of - the VE can be easily achieved by 
pointing in a certain direction and by moving either the whole body or the hand 
while pressing a button on a joystick or the like. The output from the computer (for 
example, visual or audio) also varies depending on the system. While a non- 
immersive VR system provides the user with a 2D or IViD image on a computer 
screen and often poor sound or no sound at all, an immersive system offers a 3D 
image and in some cases 3D sound that surrounds the user. However, when people 
meet in a shared virtual space, linked together via networks and with different 
computer input and output systems, there is nothing in the graphical environment 
informing the users about what system the partner or partners have. For example, 
the “avatar”, the graphical representation of an immersed user, looks the same as 
the avatar of a non-immersed user, and a user with a high-bandwidth connection 
has the same appearance as a user with a low-bandwidth connection. The 
implications of this, and what can be done about the problems related to it, will be 
illustrated later in this section. 

For now it is enough to point out that: 1) people who use shared VEs often do so 
via different technical systems; and that 2) the character of these systems affects 
how the users interact with the VE and with other users sharing the same space. 
The more powerful the technology (e.g., higher transmission speed, more - or 
easier to handle - input modes, higher degree of immersion, etc.), the more 
influential the user becomes in the interaction with the VE and with other users. A 
study by Slater et al. [13] of a word puzzle-solving task involving three participants 
showed that leadership varies between a virtual setting in which the more 
immersed participant (using a head-mounted display) was singled out as the leader 
- as against the same task performed in the real setting where no one was singled 
out as the leader. We found the same result in a study [14] of a puzzle-solving task 
between two participants (one using an immersive CAVE-type system and one 
using a non-immersive desktop system) where the participants were asked to do a 
Rubik’s cube-type puzzle - putting together 8 blocks with one of 6 different colors 
on each side to form a cube such that each side of the completed cube displayed a 
single color. In the CAVE-type system, participants could move the blocks by 
putting their hand, holding a 3D mouse, into the virtual block, pressing a button of 
the 3D mouse and moving their hand in the desired direction. Navigation was 
purely by moving around physically. On the desktop system, participants could 
navigate by moving the middle mouse button and to move the blocks, they had to 
select the blocks by clicking on the block with the left mouse button, pressing the 
right mouse button, and moving the mouse in the desired direction. They could 
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also rotate the cube by pressing the right mouse button combined with the shift 
key. In both systems, users were represented by identical human-like avatars and 
could communicate via telephones using headsets. The participants did not know 
what type of system the other person was using. 



Figure 11.1 Two people solving the virtual puzzle together. 


As in Slater et al.’s study [13], we found that when carrying out the task in the 
virtual condition, the immersed person was singled out (by both participants) as 
being more active in solving the task, while in carrying out the same task in real, 
with cardboard blocks, both participants considered themselves to be equal. 

As already mentioned, there is nothing in the environment that reveals what 
kind of system the users in a shared VE are using. However, when one studies the 
interaction between the users in a collaboration situation, one can see repeated 
examples of how different systems affect the interaction. Below is a typical 
example from the trial described above where one participant (person C) used the 
immersive and intuitive-to-use CAVE-type system whereas the other participant 
(person D) used the non-immersive desktop system which was much more difficult 
to handle. What often happened, as this example shows, was that person D failed to 
manipulate a virtual object and person C then offered his/her help and immediately 
- and with great ease - carried out the action. Since it was extremely easy for 
person C to manipulate the objects in comparison to person D, person D often 
expressed a feeling of low self-confidence, leaving person C to manipulate the 
objects and to decide how to go about solving the task. When person D in the 
example below forgets to “let go of’ an object that had previously been picked up 
so that person C can pick it up, person D shows some slight embarrassment 
(laughs) when becoming aware of his/her mistake. 

D: Den har den har ar bla... 

(This one, this one is blue...) 

C:Mm 

(Mm) 

D: ... som jag halier i nu, ska vi se om jag kan vrida den sa... 

(... the one I’m holding now, let’s see if I can turn it so...) 

C: Slapp den och ta ta nasta tag 

(Let go of it and take the next grip) 
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D: Ja, fast jag vill vrida den som den bla ar at samma hall som 

(Yes, but I want to turn it in the same direction as the blue one is 
turning) 

C: Om du slapper den sa kan jag vrida den 
(If you let go of it I can turn it) 

D: Ja okej da, den bla ar diagonal! uppat... 

(Yes, okay then, the blue turns diagonally up...) 

C: Ja 
(Yes) 

D: ... ocksa 
(... too) 

C: Ja, men du har inte slappt den an 
(Yes, but you haven’t let go of it yet) 

D: Nej, det har jag inte [skratt], sa 

(Right, I haven’t [laughs], there you go) 

Also, the fact that the systems differ from each other and that participants are not 
aware of this often generates misunderstandings, as we shall see in the next 
example from the same trial. What often happens is that one person assumes that 
the other person has the same system and capabilities, or, as in the example below, 
that the non-immersed person (person D) gets the sense that person C can do things 
in the VE that s/he cannot do (e.g., bending down to look at the blocks from below, 
pointing at objects with the hand holding the tracked 3D mouse). However, since 
both persons assume that they are using the same kind of system, the 
misunderstandings are hardly ever worked out; instead, person D very often ends 
up not knowing whether it is the system’s or his/her own fault that things do not 
seem to work properly, as in the following example. And since the system should 
be working correctly in an experimental situation, it is likely that D blames him- 
/herself for the experienced difficulties. 

C: Den blaa ser jag valdigt tydligt 
(The blue one I can see very clearly) 

D: Den blaa... var? 

(The blue... where?) 

C: Den dar 
(That one) 

D: Den? Oj, du kan peka - hur gor man det? 

(That one? Wow, you can point - how do you do that?) 

C: Jag pekar med handen 
(I point with my hand) 

D: Vada, alltsa mus... pilen? 

(What, you mean the mouse.. .cursor?) 

C: Ja 
(Yes) 

D: Om jag pe... pekar jag pa samma nu eller? 

(If I poi... am I pointing at the same one now or what?) 

C: Jag ser inte dig namligen 
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(I can’t see you, in fact) 

Another example of inequalities due to technology differences that I came across is 
in a study of language encounters in an online graphical multi-user VE, where 
people from different cultures meet and interact. People interacting with each other 
do not know what kind of system other users have. Instead, they take for granted, 
as in the situation above, that their own system is the norm and that everyone else 
in the VE can do, see, and hear things the same way they can. This, of course, is 
not the case. On the one hand there is the hardware and software that is 
individually purchased, and may differ between people on a micro-level. The 
technology can generate inequalities between people, as in the following example 
from the on-line graphical multi-user VE system Active Worlds (AW). AW is an 
internet-based system which allows interaction between users in a 3D computer¬ 
generated VE. The communication is mainly text-based but each participant is 
represented by an avatar, a 3D body, and users can also gesture to each other with 
their bodies. AW today consists of more than 600 interlinked virtual worlds that 
are typically used by between 100 and 300 users at any time of the day. The system 
has also been in continuous use since 1995 (for a sample of internet-based 
graphical virtual systems, see www.ccon.org). 

In the following example one user plays music to a group of users, assuming that 
they can all hear it, but when the question arises, it is only one other user in the 
group that can. The other two users that are present have missed the music and thus 
something central in the social interaction of the group. 

Lone wolf: have you heard any of the tunes? 

LaurelLee: yes 
TANYA: noooooo Lone 
LaurelLee: i play it 
LaurelLee: on my land 
TANYA: when 

Side step: i really should get this sound card working on this NT 
box 

There are, on the other hand, also inequalities that depend on differences on a 
macro-level, such that there are countries with dense high-speed and inexpensive 
networks for telecommunications, and other countries where the networks are not 
as fast, cheap or widespread. 

The example below, also from AW, shows how one user, Greenway, has serious 
problems taking part in the social interaction due to a slow internet connection. The 
other users are engaged in playing with their avatars and shifting appearances 
quickly to amuse themselves (which is very common in online VEs, see, for 
example, [15]), an activity which can be said to be not only a trivial game, but 
something that helps people to express and discuss identity in a playful way in a 
mediated situation. However, due to the slow internet connection, Greenway’s 
computer-generated images do not update properly. Instead of seeing a colourful 
and detailed image of an avatar, Greenway sees only a black triangle in the 
position of the avatar. 
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Not being able to take part in a social activity like changing avatars, does of 
course, give the user a less enjoyable and less meaningful online experience 
compared to other users that are better off when it comes to technology (again, this 
can be seen in the example below). Also, as in previous examples, other users are 
quite unaware of the technical problem but see only its consequences - a hindered 
social performance. 



Figure 11.2 Three people engaged in a conversation in Active Worlds. 


Greenway: 

TANYA: 

Greenway: 

Side step: 
Greenway: 

Side step: 

Greenway: 

Greenway: 

Greenway: 

Greenway: 

Greenway: 

Greenway: 

Side step: 

Greenway: 

Greenway: 

Greenway: 


1 kb/minute - just great 
how r u greenway 

u gotta excuse my 4 my slow reaction, my modem is 
.almost dead 

rut ro greenway...what are you connected in at? 
i have a radio modem.... it’s almost free but....it 
suxxxx 

lol...radio modem..hmnim i’ve seen it but never really 
read specs on it... 

in Poland (is where i live) a good internet conection is 

science-fiction 

unless u’re a millioner 

i’m not 

the other line i have is fast but is costs about 1$ per 2 
hours 

thats much here 

can u change a avatar right now ? 
yes i can greenway i have a bunch to choose from 
i still see black triangles , mostly 
i’m losing all the fiin 
:-((( 
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jill: green why is that 

Greenway: i hasen’t loaded yet 

Greenway: i thin it never will 


11.2.2 Anyone Italian? 

As more people connect to the Internet more cultures meet and interact. However, 
this does not automatically bring about a multi-lingual meeting place online, but a 
place where English rules and other languages exist in the periphery. In online 
graphical multi-user VEs like AW, users speaking languages other than English 
either have to establish their own places (so-called national worlds like “France”, 
“Italy”, etc.) and meet there, or change to the dominant English language to be able 
to interact with other users. In short, not knowing English in online situations is a 
disadvantage, while knowing English well is an advantage. One could, of course, 
argue that there are places in AW where non-English speaking people can meet, 
but these places are few and, as I mentioned before, quite peripheral, while the 
central places of most systems are predominantly English-speaking. So if, as an 
AW user, one wants to be influential in the central activity of the system, one is 
definitely disadvantaged, if not excluded, if one has poor English skills. 

When entering a non-English conversation in AW, as in most offline social 
interaction, a native English-speaking person can almost always keep on speaking 
English, since most non-English speakers have good, or at least satisfactory, skills 
in speaking English. 

In the following example, taken from a situation in one of the AW worlds, a 
conversation in Swedish is suddenly turned into English when a native English- 
speaker enters. The English-speaking person expresses his/her aversion against the 
foreign language and tells the speakers to change into English, which they do 
immediately. 


Mikael: 

HejGC...Allt val? 


(Hi GC, Everything’s allright?) 

GoodCake: 

allt ar bra 


(everything’s fine) 

GoodCake: 

du? 


(and you?) 

Kango: 

arrrgh speak englihs 

Mikael: 

Hi Kango..*S*..Sorry 


However, when the opposite happens, when a non-English speaker enters a 
conversation held in English, people hardly ever change into the introduced 
language and try only occasionally to find a possible way of communicating. The 
following example illustrates this, and points out how socially disadvantaged one is 
in an English-dominant setting as a non-English speaker (as several participants are 
in the following situation). 
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guardia: chi e italiano ? 

(anyone kalian?) 

Rose lee: hi guardia 
guardia: I'M ITALIAN 

Rose lee: nice to meet you guardia 
guardia: CHI e ITALIANO ?? 

(ANYONE ITALIAN?) 

Rose lee: <sorry speaks only english 

Siro: sorry only english i have a enough trouble with that 


What one can conclude is that native English-speakers are privileged in internet- 
based multi-user VEs (as argued in greater detail in [16]). When an English 
speaker enters a non-English conversation, the conversation changes into English, 
and when a non-English speaker enters an English conversation, the same thing 
happens. As a non-English speaker you have to adapt to other people, or use a 
second language, to make yourself heard, while as an English speaker, you can 
almost always use your first language. This has implications, of course, for the 
social interaction. If one cannot express oneself fast or precisely enough in a 
conversation, one’s role in the situation is definitely affected: one is less keen on 
engaging in discussions of complex topics, less eager to tell jokes, and less willing 
to represent the group in public. This applies also to offline communication, and in 
a sense it is even stronger there, since the social presence of other people face-to- 
face increases the embarrassment of the second-language speaker. At the same 
time, language is definitely more important and central in online interaction than in 
face-to-face meetings [17]. This, together with the fact that off-line social 
conventions like embarrassment are often brought online (e.g., [18, 13]), makes the 
inequalities depending on different language skills different from offline ones, but 
no less problematic. 


11.2.3 Status and Stratification 

The most frequently asked question in online graphical multi-user VEs is probably 
“A/S/L?” - age, sex, and location. This is a quick way of getting a sense of with 
whom one is interacting. In offline interaction this question would seem rather 
ridiculous, since parts of the answer are obvious. While one can seldom tell where 
people come from by what they look like, this question is not one of the first that 
people put when meeting others for the first time. When people begin a 
conversation, language and accent are quite revealing, and we can often make a 
good guess about people’s origins after a short exchange. Online, however, people 
need to ask explicitly about where people come from, since the lack of social cues 
(e.g., language, accent, physical appearance) prevents us from quickly forming an 
impression of other people. 
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This question is also a way of deciding how to continue the contact with the 
person; whether to establish a relationship or to call it off at an early stage. Adults 
online may not want to establish a relationship with a child (and vice versa), 
women are not always keen on engaging in relationships with men, and people in 
general seem to be more willing to interact with people from the same country or a 
nearby country (geographically or culturally). Online, one comes across a 
reluctance to associate with people with certain backgrounds as well as a readiness 
to interact with people with other backgrounds. For example, people have not 
infrequently made comments about my allegedly trustworthy Swedish nationality 
and referred to this as a reason for interacting with me, in the following way: “I 
trust you...you’re a Swede...” (comment from a male avatar in Active Worlds 
when I asked him how, after only a short acquaintance, he could let me share his 
building privileges in the VE, which would mean that I would be able to build in 
his name and also vandalize his buildings). 

Categorizations that play an important role offline in shaping the social 
interaction, like age, sex and ethnicity, do not cease to exist when people enter a 
VE and become avatars, as many had hoped, but become even more influential 
online due to the lack of social and status cues [19]. Since we do not get a very 
nuanced picture of the people we meet, we tend to rely, more than offline, on 
stereotypical images of people as a first approximation of who we are interacting 
with. When going online, we do not leave our presuppositions about people 
behind, but rather bring them with us and allow them to rule the social interaction 
more strongly. 

Thus, when talking about status differences in VEs, we can, on the one hand, 
point to the inequalities that we bring into the system, such as the crude picture of 
which nationalities to trust and which nationalities are not trustworthy. On the 
other hand, we can identify status differences which have little or no relevance 
outside the system, but which are very real and important within the system. These 
system-dependent status differences are sometimes brought into the system by the 
system administrators themselves, for example, with the establishment of certain 
functions in the online community like “GateKeeper” and “PeaceKeeper” in the 
AW system or “Wizards”, “Gods” and “Magicians” [20, 21] in certain MUDs. 
These functions are associated with certain responsibilities and rights in the system 
(e.g., keeping the peace by ejecting unruly users from the system, developing the 
community by programming and implementing new features, etc.) and they give 
these users a higher status. They also give them more influence over the 
technology, and hence over social interaction, than ordinary users. 

In AW there is another distinction between users, also implemented by the 
system administrators, between tourists and citizens, that is to say between users 
that can use the system for free with limited privileges, and users that pay an 
annual fee to enjoy certain privileges within the system. These include the ability 
to send telegrams and files to other citizens, to build and own property that cannot 
be deleted by other users, to maintain a contact list to keep in touch with other 
users, and to reserve a unique citizen name for one’s own use. (The complete list of 
privileges for AW citizens can be found on the Active Worlds web pages). One 
could argue that these privileges are of little importance - they do not give any 
power over other users but are only for an individual’s use. However, when one 
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looks at how online social interaction in AW works, one can understand that these 
privileges are powerful. Without having a stable identity (supported by the ability 
to keep one unique name), an ability to keep in touch with the people one meets 
(supported by the contact list) and without the means to protect one’s home 
(supported by the building protection), one is more or less isolated from the social 
life of other avatars. 

As has been shown in several recent studies, people in online VEs socialize in 
much the same way as they do offline: they get together with people they know, 
and often do so in their own homes [6, 7]. This is much more difficult without the 
social privileges of citizenship. Also, since the standard AW avatar is reserved for 
tourists, other users can easily tell who has these social privileges and who hasn’t; 
or, who is keen on and able to engage socially and who isn’t. As one regular 
user/citizen once explained to me, when she talked about who she chooses to 
interact with online, she does not want to engage with tourists since they are just 
people “passing by”. 

Another divide between users that is not implemented by the administrators, but 
which has emerged from the social interaction between users, is the difference 
between “insiders” and “outsiders”. “Insiders” can be described as users who not 
only have the formal privileges of citizens, but who also enjoy the informal 
privileges of having a certain affiliation, belonging to a subgroup within the larger 
community where “they talk and behave with each other in a knowing way, 
building on their familiarity with the conventions of talking in and interacting with 
the virtual world and their adeptness at navigating through it and manipulating it” 
[22]. “Outsiders” can be described as users who are not members of a subgroup, 
but who have only a “fleeting acquaintance with the virtual world [that] does not 
allow or enable them to participate in the more close-knit social networks of 
insiders'” [22]. Taylor, in chapter 3, points out that users of the graphical multi-user 
system “Dreamscape” sometimes signal this affiliation to a certain subgroup 
through their avatars (for example, color choices, bodies, accessories, or heads). 
This can also be seen in the AW system, where owners of certain “theme” worlds 
provide their users with unique avatars, and where the “insiders” themselves set up 
and defend certain rules and norms for behavior, such as who is allowed to use 
certain names containing status signs or letters, or to use particular avatars 
conveying messages to other “insiders”, or when to speak and what kind of 
language style to use. An example of the latter is contained in the following 
exchange, where “sia” is a new user, unfamiliar with the conventions of the theme 
world. 

Damon: welcome to [name of a theme world] little one *smiles* 

sia: Thank you Damon 

Damon: its Captain Damon sia.. or Master on other worlds.. 

Regal Rio: then drop the "i’U"...always talk about yourself in third 
person..."this girl" or" a girl"...etc... 

Schroeder [22] suggested already in 1997 that virtual worlds produce 
“stratification”, that is, that different groups of users develop distinctive behaviors 
and roles that distinguish them from other groups with a different status or with a 
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different sense of cultural cohesion. The examples that I have put forward here 
include the same phenomenon, though I would like to put more emphasis on the 
technical design of the system and its relations to social interaction, since, as we 
have seen, it is very much the technical features that determine the social 
stratification. 


11.3 Conclusions and Recommendations 

Having argued that there is a digital divide between users in VEs, and having 
presented a number of inequalities between users, it is appropriate to raise the 
question why this is an important topic. Perhaps the most important reason is 
because the future of the computer will be networked. In the last five years or so 
we have seen the computer develop from a stand-alone machine to a networked 
device which cooperates with others. In the future, we will no doubt use computer 
technology that is even more interlinked and less on its own than we do today. 
Another reason for shedding light on this issue is that computer networks will 
continue to link together different kinds of computer systems. Input and output 
devices, network speed and access, software, and combinations of components will 
be different on either side of the computer network. Thus we will in the future also 
have different capabilities when interacting with others via computer networks. 
Yet another reason for highlighting this issue is that more people and more 
nationalities will be using the Internet in the future, which will mean more 
languages and more cultures online, interacting with each other. This will entail a 
multi-lingual and multi-cultural challenge to the dominance of English, but also a 
risk that status differences offline will continue determining online interaction. 

However, before I conclude the discussion of the “problem” of status differences, 
this could be a good place to raise the question whether the problem is a problem at 
all. That is, is inequality something negative that one should in all instances try to 
counteract? Or could it be the case that inequality can be something positive? In 
other words, is equality between group members always something we should 
strive for? 

In certain situations involving two or more people, it can be advantageous for 
group members to have an asymmetrical relationship, such as different roles and 
status [23]. For example, in a teaching situation, a teacher and his/her students have 
different and rather unequal roles. The teacher has the right to speak but students 
usually have this right only when the teacher gives it to them. This division of roles 
facilitates teaching. In other situations, outside the classroom for example, this 
division would seem strange and unnecessary since an informal chat does not need 
to be directed in the same way teaching does. 

However, as I mentioned in the introduction, in networked multi-user 
environments, the lack of social cues in VEs has made many believe that status 
differences have also vanished. This may be correct to some extent since we 
cannot, as in non-mediated situations, determine who is a child, a black person or a 
woman with only a quick glance. Instead, as we have seen, we have other status 
differences and inequalities to take into account online, such as unequal 
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technological capabilities and status differences due to one’s social role (e.g., 
citizen or tourist in AW) in a particular environment. 

My point in this chapter is therefore not to argue for an unreserved equality in all 
online situations, or to oppose status differences in every situation, but instead to 
point out that there are other, medium-dependent inequalities operating apart from 
those with which we may be familiar. 

I would also like to suggest some potential solutions to the problem of status 
differences in VEs - for situations when one would like to overcome them, and to 
foster more equal collaboration. The problem of status differences in virtual 
environments that has been discussed from various vantage points in this chapter is 
not a single problem but several. As we have seen, we not only import offline 
social behavior, but we also create new forms of interaction and new inequalities 
online that are related to the technology. In view of the complexity of the problems 
of status differences, there is, moreover, no single solution to coping with them but 
several. In the final section of this chapter, I will suggest a few potential solutions, 
and also point to a few examples of how both users and developers have already 
made various attempts to deal with status differences themselves. 

Before I do this, I would like to mention that I am far from pessimistic about the 
possibility of creating a more equal interaction in VEs. I am, however, a strong 
opponent of the view that has been held for too long and by too many, that the 
Internet and VEs will automatically - without any human intervention - lead to 
equal interaction. My view is that technology on the one hand, and social 
conventions on the other, influence social interaction, online as well as offline. I 
believe that in order to create and support more equal interaction, we need to be 
aware of this and to work actively towards greater equality. I do not believe that we 
will ever succeed in establishing fully equal online interaction, just by configuring 
the technology, since there are also many personal and structural factors involved 
in the process. Nevertheless, I think we should give it a good try since the goal is 
so precious. 

A first step towards an equalization of the status differences in VEs is to help 
users become aware of the fact that technology shapes social interaction; put 
differently, that different technical possibilities create different capabilities for 
taking part in social interaction. One way of increasing the awareness of people 
using networked graphical multi-user VEs is to implement a shared view (one’s 
own and the other user’s) into the technology itself to provide users with the 
possibility of sharing others’ experiences. This has been tried quite successfully in 
web applications [24] as well as in VEs [25], but could be developed for even 
better performance. One possibility would be not only to allow people to share 
viewpoints, to see from the perspective that others have, but also to let them share 
the quality of the picture or the technical conditions for a better understanding of 
what they actually experience and why they behave as they do. Another solution 
could be to attach a text box to the avatar (to be viewed on demand) where his/her 
technical system, language skills, and other personal characteristics important to 
the interaction are presented. 

There are, of course, not only advantages with this kind of suggestion. One 
drawback could be that the feeling of presence would probably decrease because of 
the possibility of bringing in other views and text information into the 
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environment. Another problem would, of course, be that text information about 
system capacities is very difficult for most people to understand and interpret in a 
meaningful way in terms of what this actually means for the user’s performance. 

Another solution to inequalities between users with different language skills, 
especially in online graphical VEs that use text, would be to implement specific 
online translating programs (such as Babel Fish or Babylon Translator) into the 
system to help people communicate more easily. However, since English speakers 
are very dominant, and have been for a long time, there is perhaps little hope that 
they will use a software program to communicate in Spanish or Russian. It is more 
likely that this would become a tool for non-English speakers to try to keep up with 
conversations in English, or an excuse for English speakers not to learn a foreign 
language. 

Last, but not least, apart from the technical possibilities to equalize status 
differences and support disadvantaged people, I would simply like to put forward 
the importance of increasing VE users’ awareness of this problem. By talking and 
writing about it, I think we can create an awareness about the fact that people are 
often in situations different from one’s own, and that these conditions often result 
in an experience of inequality in social interaction. The awareness of the fact that 
one’s own situation is probably different from one’s partner’s, could, for example, 
not only lead to a more explicit presentation of one’s own capabilities and 
inabilities (as in the first example below), but also to a more attentive attitude 
towards other people (as in the second example), ensuring that there is a shared 
understanding of the (virtual) world. 


( 1 ) 

C: John du ska veta en sak att jag ar fargblind sa det att det kan stalla 
till lite problem, men jag ser gult 

(John I want you to know that I am colour-blind and that can 
make it a little bit problematic, but I can see yellow) 
och jag ser och jag forstod att det var en bla men sen kan det vara och 
gron tror jag ocksa jag ser 

(and I see and I understood that it was a blue one but it could also 
be and green I think I see too) 

( 2 ) 

C: Du ser hur jag star? 

(You can see how I stand, right?) 

D: Ja. Du ser mig med va? 

(Yes. You can see me too, can’t you?) 

In this chapter, by means of a few examples, I have tried to highlight and 
problematize the notion that VEs and other online environments could, because of 
their informal nature, equalize the differences among users. As I have shown, it is 
not quite that simple. First, the technology that generates and transmits the VEs can 
in some cases itself be the cause of status differences. Secondly, users tend to bring 
offline social conventions, behavior and inequalities into online interaction. And 
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third and finally, users seem not only to bring old conventions and behavior with 
them into this new situation, but also create new inequalities online. 

However, even if there has been much hype in relation to the potential of the 
Internet, there is also, as I have shown, a great deal of hope for VEs, in the ability 
to create and maintain a context in which people can interact equally. Still, the 
Internet will not automatically generate equal interaction. To be able to bridge 
successfully the digital divide between users and support alternative, informal and 
less hierarchical relations, the Internet needs developers and users who are aware 
of these issues and who are ready to work actively and constructively towards a 
new technical and social order. 
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Chapter 12 

The Social Life of Small Graphical 
Chat Spaces 

Marc Smith, Shelly Farnham and Steven Drucker 


12.1 Introduction 


12.1.1 Overview 

This paper provides a unique quantitative analysis of the social dynamics of three 
chat rooms in the Microsoft V-Chat graphical chat system. We used survey and 
behavioral data to study user experience and activity. 150 V-Chat participants 
completed a web-based survey and data logs were collected from three V-Chat 
rooms over the course of 119 days. This data illustrates the usage patterns of 
graphical chat systems, and highlights the ways physical proxemics are translated 
into social interactions in online environments. V-Chat participants actively used 
gestures, avatars, and movement as part of their social interactions. Analyses of 
clustering patterns and movement data show that avatars were used to provide non¬ 
verbal cues similar to those found in face-to-face interactions. However, use of 
some graphical features, in particular gestures, declined as users became more 
experienced with the system. As we shall see, these findings have implications for 
the design and study of online interactive environments. 


12.1.2 Background 

Text chats lack non-verbal cues, such as gestures, physical distance, and direction 
of eye gaze that facilitate face-to-face conversations. Graphical chats attempt to 
address these limitations by introducing surrogate representations for physical 
bodies and spaces [1, 2]. While a number of graphical chat systems have been 
created, little is known about the nature of social interaction in publicly-accessible 
spaces [3, 4, 5]. 
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What do people do in graphical chat spaces? Do they cluster together in patterns 
approximating those seen in face-to-face interaction? How are the graphical 
features used in concert with textual modes of interaction? Broadly, we want to 
investigate whether these spaces are sociopetal, drawing people together into 
interaction, or sociofugal, driving them apart and away from interaction with one 
another [6]. To address these questions we report the results of survey research and 
analyses of more than three months of log files gathered from within three rooms 
(Lobby, Lodge, and Red Den) in the Microsoft V-Chat graphical chat system [7]. 

V-Chat clients connect to Internet Relay Chat (IRC) channels for communication 
transport. IRC is used to carry text chat as well as information about graphical 
events including avatar location and gestures. V-Chat provides a representation of 
each room as a 3D space, linked to a text chat window (Figure 12.1). Each space 
can contain up to 25 simultaneous internet users. V-Chat allows users to puppet a 
graphical representation of themselves, an “avatar”, in the 3D space. All users 
within the same room can see each other’s messages (with the exception of 
“whispers” which are private point-to-point messages), irrespective of the distances 
between avatars. All avatars could also potentially see every other avatar 
depending on their line of sight. Traditional IRC users lack an avatar in the space, 
but appear in the user list and text box. People are able to select a standard avatar 
provided by the program, an avatar created by another user, or to create a custom 
avatar of their own. V-Chat avatars are represented by “sprites” (as 2D 
representations of users in a 3D space have become known), which have twenty 
frames, allowing them to communicate both direction of view in the 3D space and 
a series of gestures. 



Figure 12.1 V-Chat interface includes a chat text box, chat history window, 3D 
space containing other avatars, room occupancy list, and an image of one’s own 
avatar. 


While V-Chat lacks object persistence, interactive objects, or user extensibility of 
the environment, it does implement many of the core features found in a broad 
range of graphical interaction tools. As such, an investigation of actual user 
behavior in V-Chat can shed significant light on the nature of social interaction in 
3D virtual spaces. 
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Our investigation provides a longitudinal study of user behavior as well as 
analyses of user behaviors overall. These results lead back to central design and 
system management issues related to the development of 3D graphical 
environments for social interaction. 

Our work follows the studies of physical social spaces pioneered by William H. 
Whyte [8, 9]. Whyte’s studies highlighted the ways people moved through and 
came to rest in parks and plazas and how social interactions, from the casual to the 
intense, were shaped by design choices and the structure of the space. 

We examined user behavior focusing on three issues: 1) the general usage 
patterns of the chat room participants; 2) the use of 3D features of V-Chat; and 3) 
the contrasts between text-only users and users of the 3D features of V-Chat. 


12.2 Research Methods 

We address these issues by using both survey data and quantitative analyses of user 
behavior. While the survey data provides insight into the user’s subjective 
experience, quantitative analyses provide a more objective representation of chat 
behavior. Such quantitative analyses are distinct from ethnographic studies, which 
take the form of direct observation of participant behavior and activity in the 
virtual space. While ethnographic studies provide valuable information about the 
content and meaning of social relationships, they have significant limitations [10, 
2]. Direct observation is labor-intensive, misses many forms of interaction and 
patterns that are difficult to observe from a first person view, is subject to the 
biases of the observer, and often lacks broad context or duration. 

Quantitative analyses of log file data provide a useful complement to such 
ethnographic studies. Collected logs of user activity can be used to produce a broad 
range of measures of the social structure and dynamics of interaction in the world. 
Combined with qualitative data, these measures can provide a broad backdrop for a 
multi-layered and complex picture of what really goes on in these graphical spaces. 
On their own, quantitative measures at least provide a possible basis for future 
comparison between varieties of graphical interaction systems. 

For the present study we gathered data from three of the more popular V-Chat 
spaces, the “Lobby”, “Lodge”, and “Red Den”, using a logbot. The data we report 
was gathered from 10/22/98 at 12:38:38 PST until 1/16/99 at 17:47:07, a total of 
119 days. The bot had no avatar in the space but did show up in the user list (as 
“LogBert”). A sign was placed in every room being logged announcing the data 
collection and pointing to documents that described the project. These rooms were 
selected because they were the most active of all the rooms available from the 
public Microsoft V-Chat servers. The system did not require users to enter any of 
these rooms in order to access others. Nonetheless, the “Lobby” was listed as a 
default choice in the V-Chat user interface. 

The bot received the same information as all of the V-Chat clients; it added a 
time stamp and wrote the data to a set of files. Private communication between 
users provided by the whisper command was invisible to the logs we collected. 
Logs contained the following information for each V-Chat event: 
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TIME, DATE, NAME, ACTION, ARGUMENTS, X, Y, Z, Rotation 

These logs were analyzed to generate a series of reports and graphs that profiled 
users, user sessions, and avatars. Log files were aggregated on the basis of the 
events and other world states to produce a range of behavioral measures. 

We found that the data files were fairly noisy. The logbot was often disconnected 
from the server, introducing data dropouts and skewing login counts when it 
automatically logged back into the spaces. We found that the data sent to clients 
was noisy. Many users appeared without login events. Position data was fairly low 
resolution, providing coordinates of avatars in motion only once per second. The 
pattern of jumpy motion in the data is an artifact and does not reflect the user’s 
experience of their own motion, but it does accurately reflect the way other users’ 
motion was presented. Additional issues raised by the nature of the data are 
discussed below. 

Survey data were collected from a self-selected sample of 150 V-Chat users. 
Respondents were recruited from within the V-Chat rooms using signs placed in 
the space with URLs pointing to the web-based survey. The survey asked for a 
broad rage of information, including demographic background, V-Chat usage 
patterns, and ratings of satisfaction with the V-Chat experience. These results 
offered a supplement to the log data. 


12.3 Results 


12.3.1 General V-Chat Usage 

35024 unique user names appeared in the three V-Chat rooms in the span of 119 
days, averaging five chat sessions each. The average session length, the span of 
time beginning when the person arrived in a room and ending when the person left 
the room, was 6.6 minutes. 44% of the users logged in only once. Those who 
logged in more than once had an average of eight sessions in the 119 days. Their 
session lengths averaged 6.4 minutes. 23.1% of the people were traditional IRC 
users, and 76.9% were V-Chat users. 

Users were only identified by self-selected and non-persistent “handles” or user 
names. No email addresses, IP numbers or physical demographic data were 
available through the system. However, our survey data provides a picture of the 
basic demographic characteristics of the self-responding population. The average 
user was 29 years old, 72% of the users were male, and 28% female. 68% of all 
users had at least some college education. 45% of the users were single, 55% were 
not. Most of the users were from the United States or Canada (70%), and many of 
the remaining users were from Europe (17%). 

An examination of the chat sessions shows that people tended to visit the rooms 
in the afternoon, from 2pm to 8pm, Pacific Standard Time (or from 5pm to 11pm, 
Eastern Standard Time) (Figure 12.2). While we were unable to determine the 
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user’s local time, most users were from the United States so they fell within the 
range of PST to EST. Afternoon use peaked sharply on Thursday afternoons, and 
dropped on Saturday afternoons. 



Time 


Sun 

Mon 

Tues 

Wed 

Thurs 

Frs 

Sat 


Figure 12.2 Count of chat sessions depending on time of day and day of week. 
Pacific Standard Time. 


During each session, people posted an average of 3.4 messages. However, an 
unexpectedly large percentage of the people, 61.3%, posted no messages, 
observing others without participation. Session lengths were much shorter when 
users did not post any messages (3.1 minutes) than when they posted at least one 
(8.4 minutes). When people did speak, their utterances were fairly short, averaging 
23 characters, or approximately 5 words. 

Conversational openings were the most common form of exchange; an analysis 
of a subset of the data shows that out of 31,529 messages posted, 23% had some 
form of greeting in the text (e.g., “hello, hiya, what’s up”) and 4% had some form 
of goodbye in the text (e.g., “bye, brb”). 14% of the messages included the names 
of one of the others in the room. 


12.3.2 Use of 3D V-Chat Features 

Do people use the 3D features of graphical chats? If so, was that use sustained? It 
is important to consider the possibility that people might not use the 3D features at 
all, focusing for the most part on the text chat component of the program, or that 
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people might use the 3D features initially for the sake of novelty, but use them less 
so as the novelty wore off. How were 3D features actually used as a component of 
social interactions? People might play with gestures and move around the 3D 
spaces without incorporating gestures and movement into their social interactions. 

V-Chat users reported using both the text windows to chat with others, and the 
3D features of V-Chat. In the survey, 76% of the people reported paying equal 
attention to both the text window and the graphic window, 14% mostly paid 
attention to the text, and 10% mostly looked at the graphics. However such self- 
report data provided to the V-Chat providers tends to be biased by both sampling 
concerns (perhaps only avid V-Chat users bothered to answer the questionnaire) 
and demand characteristics, where the respondents felt compelled to report using 
the 3D features out of a desire to be good subjects. We examined the log data to 
determine whether people used the 3D features, and whether they were used as a 
part of social interactions. 

The three most prominent 3D features are the customizability of the avatars, the 
avatar gestures, and the position and orientation of the avatars. The following 
sections examine each of these features. 

12.3,2.1 Avatars 

People were able either to use one of 20 standard avatars provided by the V-Chat 
system, create one themselves, or use one created and made publicly available by 
another V-Chat user. A total of 1979 unique avatars were used, 99% of them 
custom made. V-Chat users wore a custom avatar for 45% of all the V-Chat 
sessions. Custom avatars ranged from simple, square photographs to complex 
cartoon-like characters. Overall, about 31% of the users wore a custom avatar at 
least once. According to the survey data, people reported using custom avatars to 
express their individuality (42%), to stand out (24%), because they did not like the 
common avatars (23%) and for the challenge (11%). Two thirds of the people 
claimed they had avatars that represented their true gender. 

Frequent users were much more likely than infrequent users to have used a 
custom avatar at least once (Table 12.1). People did not tend to change avatars 
during sessions. For 74% of all sessions, only one avatar was used. People used an 
average of 1.8 unique avatars, and each avatar was used for an average of 3.6 
sessions. 


12.3.2.2 Gestures 

People were able to make their avatars perform one of seven gestures, representing 
angry, flirts, sad, shrugs, silly, smiles, and waves. As can be seen from Table 12.1, 
V-Chat users were on average using the avatar gestures .49 times per minute, or 
once every two minutes. Frequent users, or those who had visited the V-Chat 
rooms more than 15 times in 119 days, used fewer gestures: one in every four to 
ten minutes. Given that the average session was less than 8 minutes, gestures do 
not appear to be a vital, sustained aspect of social interactions for the advanced 
users. As can be seen from Figure 12.3, the most common gestures were silly and 
waves, followed by flirts and smiles. It is important to note that when people make 
custom avatars, they can associate any image with the gesture buttons. The images 
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they associate with the gestures are somewhat constrained, however, because the 
word appears in the chat window when the gesture button is clicked. 

Table 12.1 Usage of 3D features by V-Chat users, broken down by user’s number 
of sessions in 119 days. 


# sessions 

# users 

# gestures 
per minute 

# positions 
per minute 

% custom 

avatar 

1 

9165 

0.57 

5.9 

21% 

2 to 5 

11105 

0.53 

5.2 

25% 

6 to 15 

4548 

0.37 

4.6 

41% 

16 to 40 

1517 

0.35 

3.3 

62% 

>40 

601 

0.13 

2.0 

76% 

Total 

26936 

0.49 

5.2 

31% 


People may have used the silly gesture more frequently because there were three 
different randomly-chosen sequences that represented silly, so silly provided a 
humorous surprise for both the user and the observer. Friendly and positive 
gestures (silly, smiles, waves, and flirt) far outweigh (81%) conflictual or non¬ 
committal gestures (shrug, sad, and angry). 


3 . 2.3 Positioning 

Proxemics is the study of animal territoriality [6]. All animals, including humans, 
exhibit some form of territoriality. Some engage in direct physical contact with 
many others. Others, like humans, are predominantly non-contact species. Many 
people make an effort to ensure a certain space and distance is maintained around 
them. 

Can the same proxemics be observed in graphical virtual environments as in 
physical spaces? That is, do people cluster together when interacting in graphical 
space much as they would in face-to-face interactions? Or is the graphical 
component ignored? How much do people orient to one another face-to-face? Do 
they maintain territorial buffers around themselves? If so, how do they compare in 
size to those seen in physical relationships? 

An overhead perspective of the 3D graphical space provides a means for 
visualizing the proxemics of social interactions. We plotted the location of users as 
they moved through the V-Chat space (Figure 12.4). An arrow indicated the 
direction of each avatar’s gaze. Reviewing these highlighted the movement of 
users into orientations that resembled conversation circles. 
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Waves 


8% 



Figure 12.3 Breakdown of gestures used by V-Chat users. 


People were able to move their avatars with the use of either the keyboard or a 
mouse. While movement was continuous in the eyes of the user, changes in the 
avatar’s position were only recorded once per second. As can be seen from Table 
12.1, people had an average of 5.2 new positions every minute, indicating that they 
spent about 8% of their time moving. As with the gestures, the rate of positioning 
was reduced for frequent users. 

It is possible that people were moving simply to get from one end of the room to 
another, rather than to approach and look at the people with whom they are talking. 
To test whether or not people approached and looked at the people with whom they 
conversed, we needed to know who was the target of their message. We 
determined the target of a message by examining the content of the message for the 
names of the other users in the room. A subset of the log files from the main lobby 
from 12/15/1998 to 12/19/1998 was analyzed for the text content of the messages. 
In this period 1481 V-Chat users visited the lobby. For each person, there was an 
average of 20 other people copresent in the room. Messages were classified as 
being targeted or not targeted, depending on whether or not they contained the 
name of one of the other people in the room. A surprisingly large number of 
messages were targeted (13.8%). 

For each person we calculated his or her average distance and orientation toward 
both targeted others and randomly selected others (selected from all of the people 
in the room at the time the targeted messages were produced). We calculated 
distances and angles of orientation using the position data provided by the logbot at 
the time of the message. 




The Social Life of Small Graphical Chat Spaces 


213 


As can be seen from Figure 12.5, people were standing closer to their target than 
to a randomly selected other (^(497) = 6.57, p < .001). Nonetheless avatars kept 
some distance from the targeted others, suggesting the maintenance of personal 
territories. 



Figure 12.4 Top down view of the proximity and orientation of V-Chat users. 


Orientation toward others was calculated as the difference in angle between the 
vector defined by the line between the first person and second person, and the 
vector of the first person’s gaze. As such, if a person was looking directly at 
another, the angle of orientation would be 0°, if the person was looking sideways 
relative to the other, the angle would be 90 and if the person was looking in the 
opposite direction, the angle would be 180 An examination of the histograms of 
angle of orientation shows that people were generally not looking at randomly 
selected others, but rather sideways relative to randomly selected others (see Figure 
12.6). Few people had their backs turned to randomly selected others. However, 
people were prone towards looking toward the targets of their messages. 

On average, people were more oriented toward targeted others than non-targeted 
others (Ms = 63° and 72°, respectively, ^(496) = 4.17,p < .001). 

Just as people tended to be looking more toward a targeted other than a randomly 
selected other, targets were more prone to look back than were randomly selected 
others (Ms = 68° and 75 °, <496) = 3.05,p < .005). 

In addition to testing whether people approached and looked at others in the 3D 
space, we wanted to test whether people moved their avatars during the course of 
their conversations, or only before and after their conversations. In other words, 
did people interleave chat messages and avatar movements? To measure the 
interleaving of chat and avatar movement we counted the frequency with which 
people moved their avatars in-between any two messages. We found that on 
average, people moved their avatars in-between 46% of their messages. Perhaps 
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more importantly, the number of messages posted in a session did not affect this 
proportion. People moved in-between messages as much for long conversations as 
short conversations. 
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Figure 12.5 The average distance toward targeted persons and randomly selected 
non-targeted persons. Distances were measured adopting a map view of the V- 
Chat room using a 40 X 40 grid. People were standing closer to the people they 
were talking to. Note error bars represent standard errors. 


Randomly Selected Other Targeted Other 




Figure 12.6 Histograms of people’s angle of orientation relative to a randomly 
selected other, or relative to the target of a message. A person looking directly at 
another would have an angle of 0°, a person looking directly away from another 
would have an angle of 180°. 

Overall, V-Chat users appear to be using the 3D features of the program to 
reproduce the social conventions of physical proxemics. 
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People continued to use the 3D features over time, however the rate of gesture 
and positioning declined for frequent users. The reduction in the use of gestures 
and movement suggests that some initial use was due to the novelty, which then 
wore off All users were prone to change their avatar on average once per session, 
and frequent users were more likely to have used a custom avatar at least once. 


12.3.3 Contrasting Text-only and Graphical Users 

Some indication of the impact of the 3D features on social interactions is provided 
by the survey data. When asked in an open-ended question what they liked best 
about V-Chat, a full 20% of users said they liked making and seeing avatars the 
most. Only 4% liked gestures the most, and only 6% mentioned the ability to move 
around. People generally thought that V-Chat was a good place to make friends 
and meet people of the opposite sex. However, the survey data does not provide an 
objective indication of the impact that the 3D features had on people’s interactions. 

One measure of the value of 3D features in contrast to text-only systems is the 
differential rate of return, length of stay and number of sessions. An important 
further contrast is that between active participants, who spoke at least once, and 
passive participants, who never spoke at all. 

As mentioned earlier, a surprising number of people merely observe the space, 
visiting without ever saying anything (61.3%). 

As can be seen from Figure 12.7, V-Chat users were much more likely to return 
to the space than conventional IRC users, especially if they actively participated in 
the conversation. A logistic regression with the interaction entered as a cross- 
product term shows that the main effects of participation level and type of user are 
significant (/? = 1.22,/? < .0001, and p= 1.70,/? < .0001, respectively). 



Figure 12.7 Rate of returns, average session length, and average number of 
sessions, depending on type of user and participation level. 

Although V-Chat users were more likely to return to the V-Chat space than IRC 
users, they did not spend more time on each session (Figure 12.7). For active 
chatters, V-Chat users spent 1.9 minutes less per session than IRC users. This 
difference is significant, (^( 19298) = 3.03,/?, .001). 
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Although V-Chat users spent slightly less time online per session than IRC users, 
they tended to return to the space more frequently. Over the period studied, V-Chat 
users frequented the space many more times than did IRC users (^(34199) = 19.67, 
p < .001), especially if they were active participants (the type of user by 
participation level interaction is significant, ^(34198) = 14.10, j!? < .001). See Figure 
12 . 8 . 

A comparison of traditional IRC users and V-Chat users indicates that V-Chat 
users were more likely to return to the V-Chat space than IRC users, and visited the 
space a greater number of times than the V-Chat users. However, the average 
duration of the V-Chat users’ sessions was almost two minutes less than that of the 
IRC users. It can be argued that return rates, number of sessions, and duration of 
sessions provide an indirect measure of quality of social interaction. However, IRC 
users may not be returning to the V-Chat space for reasons other than that of the 
quality of the interactions they experience in the space. For example, they may 
simply feel like outsiders when they realize that many of the other users have 
bodies while they do not, and thus feel less inclined to return. Another possible 
measure of the quality of social interaction might be provided by the quantity of 
social interaction. 



standard custom 


actor's avatar 


Figure 12.8 The average relative orientation of randomly selected others toward 
an actor, depending on whether the actor was wearing a custom avatar or a 
standard avatar. 


An examination of the number of messages per minute indicates that active IRC 
users tend to speak more than active V-Chat users (Table 12.2). (We focused on 
active V-Chat users because use of the 3D features will not affect the quality of 
social interactions for people who only observe the space.) 

These results suggest that IRC users have a greater quantity of social interaction 
than V-Chat users. However, we were interested in whether the use of the 3D 
features directly affected the quantity of social interactions. As can be seen from 
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Table 12.3, V-Chat people who used the 3D features at a greater rate posted more 
messages per minute. The rate of movement and the rate of avatar changes had the 
most substantial correlation with messages posted per minute. 


Table 12.2 Means and standard deviations for messages posted, broken down by 
type of user. Only active users were included in the calculations. 


Type 

Messages 

of User 

per Minute 


Mean 

SD 

IRC 

3.37 

8.12 

V-Chat 

0.78 

1.41 


Table 12.3 Correlations between the use of 3D features and the messages posted 
for active V-Chat users. All correlations are significant at the p < .005 level. 



Messages 


per 

Use of 3D features 

Minute 

Gestures per minute 

0.22 

Positions per minute 

0.50 

Avatars per minute 

0.51 


Thus, while IRC users tend to exhibit more chat behaviors overall, V-Chat users, 
who use the 3D features at a greater rate, show higher levels of chat behaviors as 
well. However, given that these data are correlational in nature, we cannot make 
strong causal inferences. The use of 3D features may be increasing the quantity of 
messages, however the quantity of messages may in some way be increasing the 
usage of 3D features, or some third variable, such as general activity level, may be 
causing increases in both. 

We argued that positioning would enhance social interactions because it allows 
people to indicate the direction of their attention. If V-Chat users are using eye 
gaze and distance to indicate the direction of their messages, then they should need 
to address the target of their message by name less frequently than standard IRC 
users. As predicted, we found that while 14% of all messages from V-Chat users 
were targeted by including the name of someone in the chat room in the message, 
26% of all messages from IRC users were targeted with the name of someone in 
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the chat room. A logistic regression indicates this difference is significant (b = .79, 

p<.001). 

We also argued that avatars would enhance social interactions because people 
would be able to communicate information about themselves more effectively if 
they were able to represent themselves visually. Users reported feeling that they 
stood out more and were able to express themselves better if they had a custom 
avatar. If people are standing out more and expressing a richer presence if they 
have a custom avatar, then people should be looking at them more than if they do 
not have a custom avatar. 

An examination of Figure 12.8 illustrates that randomly selected others were 
more likely to be looking at a person if he or she was wearing a custom avatar than 
if he or she was wearing a standard avatar. A within-subjects analysis shows the 
difference in others’ orientation is highly significant {t{121) = 7.99,/? < .001). That 
the same person receives more attention when he or she is wearing a custom avatar 
than when he or she is wearing a standard avatar suggests that the use of custom 
avatars significantly impacts the quality of people’s social interactions. 


12.4 Conclusions and Discussion 

Log file analysis of user behavior can illustrate the dynamics and structure of 
social cyberspaces. These spaces are novel environments for interaction that host 
familiar social norms and processes. The present research shows that people use 
the 3D features of graphical chat. However, the use of such 3D features tends to be 
reduced among frequent users. Spatial management of interaction occurs in a 
manner very similar to that in physical interactions, suggesting that proximity and 
orientation information are valuable additions to network interaction media. 
People tended to be standing near and looking toward those with whom they spoke. 
At the same time they maintained some personal space. A comparison of V-Chat 
users to IRC users showed that V-Chat users were more likely to return to the V- 
Chat space, returned more frequently, but did not stay as long. Traditional IRC 
users posted many more messages than V-Chat users. However, among V-Chat 
users, the use of 3D features correlated positively with the quantity of messages 
posted. V-Chat users tended to have fewer targeted messages than traditional IRC 
users, suggesting that avatar positioning provided a non-verbal indication of 
attention similar to that found in face-to-face interactions. An examination of 
avatar usage indicates that people used about two distinct avatars across their 
sessions, that frequent users were more likely to have used custom avatars, and that 
when people used custom avatars, others were more likely to be looking at them. 

The present research has several limitations. Many of the findings presented here 
are correlational. Further experimental studies that allow for tighter control of user 
conditions are necessary to draw any causal conclusions. The possibility that 
different people used the same names in different sessions is a very real one, as is 
the possibility that individuals used multiple user names in the same or different 
sessions. The invisibility of private interactions in the form of whispers resolved an 
ethical concern in the research but reduced our ability to gauge the volume of 
interaction and reduced the indicators of interaction ties between users. The present 
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research compares traditional IRC users to V-Chat users, however the IRC users 
studied were those present in the V-Chat space. It would have been better to 
compare V-Chat users to IRC users who did not interact with V-Chat users. Future 
work should focus on contrasts between various graphical systems to explore the 
ways design decisions affect social interaction. 

Despite these limitations, the present research does suggest that people use the 
3D features of V-Chat and that the use of such features enhances social 
interactions. While 43% of the people who visited the V-Chat spaces did so only 
once, this rate is not out of line with the retention rates of many online systems. In 
addition, although frequent users were less likely to use some of the 3D features, 
even expert users continued to make use of proximity and orientation features to 
enhance their interactions in the space. V-Chat users posted significantly fewer 
messages than traditional IRC users, which may indicate that they found proxemic 
modes of communication sufficient to convey their intent to one another. Graphical 
representations, therefore, are used and may enhance social interaction in online 
spaces in many ways. 

This research suggested important directions for future work. Producing the data 
set and analysis tools used in creating this research highlighted another important 
concept: many of the issues we were concerned with are of interest and value to the 
end user while in the midst of interaction. We came to think of this work and the 
data we generated as a form of a “social accounting” system. This system could 
track the number of sessions users had in each space and how often they interacted 
with others. Future work will explore the effects of presenting such data in the user 
interfaces of such spaces in real time. We believe that social accounting data will 
add an important layer of context and history to online interaction environments 
that will improve their capacity to generate social cohesion. 
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